From zgu at openjdk.org Fri Nov 1 13:06:33 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Fri, 1 Nov 2024 13:06:33 GMT Subject: RFR: 8343333: Parallel: Cleanup comment referring Solaris in MutableNUMASpace In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 02:07:29 GMT, Zhengyu Gu wrote: > A trivial cleanup that removes comment referring Solaris. Thanks, @tschatzl ------------- PR Comment: https://git.openjdk.org/jdk/pull/21796#issuecomment-2451838741 From zgu at openjdk.org Fri Nov 1 13:06:34 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Fri, 1 Nov 2024 13:06:34 GMT Subject: Integrated: 8343333: Parallel: Cleanup comment referring Solaris in MutableNUMASpace In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 02:07:29 GMT, Zhengyu Gu wrote: > A trivial cleanup that removes comment referring Solaris. This pull request has now been integrated. Changeset: da0e9e38 Author: Zhengyu Gu URL: https://git.openjdk.org/jdk/commit/da0e9e38e378ad14ddf4577924597462d9b0595f Stats: 7 lines in 1 file changed: 0 ins; 2 del; 5 mod 8343333: Parallel: Cleanup comment referring Solaris in MutableNUMASpace Reviewed-by: tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/21796 From ayang at openjdk.org Mon Nov 4 05:39:58 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 4 Nov 2024 05:39:58 GMT Subject: RFR: 8343507: Parallel: Fail if verify_complete finds incorrect states Message-ID: Trivial change of replacing `log_warning` with `fatal`, because incorrect `destination_count` always indicate some problem. Test: tier1-3 ------------- Commit messages: - pgc-fatal Changes: https://git.openjdk.org/jdk/pull/21865/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21865&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343507 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21865.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21865/head:pull/21865 PR: https://git.openjdk.org/jdk/pull/21865 From ayang at openjdk.org Mon Nov 4 06:21:59 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 4 Nov 2024 06:21:59 GMT Subject: RFR: 8343508: Parallel: Use ordinary klass accessor in verify_filler_in_dense_prefix Message-ID: One line change to use the common API to make the caller logic less obtrusive. Test: tier1-3 ------------- Commit messages: - pgc-klass-accessor Changes: https://git.openjdk.org/jdk/pull/21866/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21866&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343508 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21866.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21866/head:pull/21866 PR: https://git.openjdk.org/jdk/pull/21866 From tschatzl at openjdk.org Mon Nov 4 07:42:29 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 4 Nov 2024 07:42:29 GMT Subject: RFR: 8343507: Parallel: Fail if verify_complete finds incorrect states In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 05:31:26 GMT, Albert Mingkun Yang wrote: > Trivial change of replacing `log_warning` with `fatal`, because incorrect `destination_count` always indicate some problem. > > Test: tier1-3 Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/parallel/psParallelCompact.cpp line 1917: > 1915: if (!c->completed()) { > 1916: fatal("region %zu not filled: destination_count=%u", > 1917: cur_region, c->destination_count()); I would prefer to use `assert(c->completed(), ...)` in both cases similar to other failures due to verification (like the one above). ------------- PR Review: https://git.openjdk.org/jdk/pull/21865#pullrequestreview-2412338308 PR Review Comment: https://git.openjdk.org/jdk/pull/21865#discussion_r1827295363 From tschatzl at openjdk.org Mon Nov 4 07:44:28 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 4 Nov 2024 07:44:28 GMT Subject: RFR: 8343508: Parallel: Use ordinary klass accessor in verify_filler_in_dense_prefix In-Reply-To: References: Message-ID: <8jvzF7sIyWQJdLuOouswz4uDiKWT95-x3iDJWmh7fZ0=.4e33866c-8ba9-4272-89ac-3a50f86b4c8f@github.com> On Mon, 4 Nov 2024 06:16:15 GMT, Albert Mingkun Yang wrote: > One line change to use the common API to make the caller logic less obtrusive. > > Test: tier1-3 Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21866#pullrequestreview-2412348941 From ayang at openjdk.org Mon Nov 4 07:55:04 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 4 Nov 2024 07:55:04 GMT Subject: RFR: 8343507: Parallel: Fail if verify_complete finds incorrect states [v2] In-Reply-To: References: Message-ID: <7SGR3x5jFTxIVogdh8N9gyaMBS9J1HWDDV8sGlxliwc=.abd6c8b3-2157-4e55-8afa-1cf087e0302c@github.com> > Trivial change of replacing `log_warning` with `fatal`, because incorrect `destination_count` always indicate some problem. > > Test: tier1-3 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21865/files - new: https://git.openjdk.org/jdk/pull/21865/files/f1e2d474..f089f3df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21865&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21865&range=00-01 Stats: 8 lines in 1 file changed: 0 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21865.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21865/head:pull/21865 PR: https://git.openjdk.org/jdk/pull/21865 From kbarrett at openjdk.org Mon Nov 4 09:15:28 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 4 Nov 2024 09:15:28 GMT Subject: RFR: 8343507: Parallel: Fail if verify_complete finds incorrect states [v2] In-Reply-To: <7SGR3x5jFTxIVogdh8N9gyaMBS9J1HWDDV8sGlxliwc=.abd6c8b3-2157-4e55-8afa-1cf087e0302c@github.com> References: <7SGR3x5jFTxIVogdh8N9gyaMBS9J1HWDDV8sGlxliwc=.abd6c8b3-2157-4e55-8afa-1cf087e0302c@github.com> Message-ID: On Mon, 4 Nov 2024 07:55:04 GMT, Albert Mingkun Yang wrote: >> Trivial change of replacing `log_warning` with `fatal`, because incorrect `destination_count` always indicate some problem. >> >> Test: tier1-3 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21865#pullrequestreview-2412532188 From simonis at openjdk.org Mon Nov 4 09:49:01 2024 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 4 Nov 2024 09:49:01 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers Message-ID: Currently `BlockLocationPrinter::print_location()` checks for a pointer if it points into the heap and if that's true, it either prints it as an oop if `is_valid_obj()` is true or it tries to find the the start address of an oop for that pointer by calling `CollectedHeapT::heap()->block_start()`. However, the `block_start()` functionality is not fully implemented for all GCs (e.g. the young generation of `ParallelScavengeHeap`) and for these cases `block_start()` returns NULL. Because of this NULL return value `os::print_location()` will finally qualify the corresponding pointer as pointing "into unknown readable memory" although we already know that it actually points into an invalid heap area. In such cases, print at least that the pointer is pointing into an unknown part of the heap instead of just saying that it points into unknown memory. I've manually tested the new functionality in GDB. ------------- Commit messages: - 8343531: Improve print_location for invalid heap pointers Changes: https://git.openjdk.org/jdk/pull/21870/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21870&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343531 Stats: 8 lines in 1 file changed: 7 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21870.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21870/head:pull/21870 PR: https://git.openjdk.org/jdk/pull/21870 From tschatzl at openjdk.org Mon Nov 4 09:52:29 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 4 Nov 2024 09:52:29 GMT Subject: RFR: 8343507: Parallel: Fail if verify_complete finds incorrect states [v2] In-Reply-To: <7SGR3x5jFTxIVogdh8N9gyaMBS9J1HWDDV8sGlxliwc=.abd6c8b3-2157-4e55-8afa-1cf087e0302c@github.com> References: <7SGR3x5jFTxIVogdh8N9gyaMBS9J1HWDDV8sGlxliwc=.abd6c8b3-2157-4e55-8afa-1cf087e0302c@github.com> Message-ID: On Mon, 4 Nov 2024 07:55:04 GMT, Albert Mingkun Yang wrote: >> Trivial change of replacing `log_warning` with `fatal`, because incorrect `destination_count` always indicate some problem. >> >> Test: tier1-3 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21865#pullrequestreview-2412609729 From ayang at openjdk.org Mon Nov 4 10:06:28 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 4 Nov 2024 10:06:28 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 09:43:18 GMT, Volker Simonis wrote: > However, the block_start() functionality is not fully implemented for all GCs (e.g. the young generation of ParallelScavengeHeap) and for these cases block_start() returns NULL. Can we implement it properly for all gcs, instead of working around the issue in the caller? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21870#issuecomment-2454270214 From tschatzl at openjdk.org Mon Nov 4 10:06:29 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 4 Nov 2024 10:06:29 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 09:43:18 GMT, Volker Simonis wrote: > Currently `BlockLocationPrinter::print_location()` checks for a pointer if it points into the heap and if that's true, it either prints it as an oop if `is_valid_obj()` is true or it tries to find the the start address of an oop for that pointer by calling `CollectedHeapT::heap()->block_start()`. > > However, the `block_start()` functionality is not fully implemented for all GCs (e.g. the young generation of `ParallelScavengeHeap`) and for these cases `block_start()` returns NULL. Because of this NULL return value `os::print_location()` will finally qualify the corresponding pointer as pointing "into unknown readable memory" although we already know that it actually points into an invalid heap area. > > In such cases, print at least that the pointer is pointing into an unknown part of the heap instead of just saying that it points into unknown memory. > > I've manually tested the new functionality in GDB. src/hotspot/share/gc/shared/locationPrinter.inline.hpp line 57: > 55: // Check if addr points into Java heap. > 56: if (CollectedHeapT::heap()->is_in(addr)) { > 57: // base_oop_or_null() might be unimplemented and return NULL for some GCs/generations In such cases where the flag that we later set is dependent on the complete condition, it seems nicer to assign the result of the condition to it right away. That saves the assignment later too, having only a single assignment to it. Ymmv. Suggestion: // Check if addr points into Java heap. bool in_heap = CollectedHeapT::heap()->is_in(addr); if (in_heap) { // base_oop_or_null() might be unimplemented and return NULL for some GCs/generations. (And drop the assignment to `in_heap` later). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21870#discussion_r1827475425 From ayang at openjdk.org Mon Nov 4 10:34:37 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 4 Nov 2024 10:34:37 GMT Subject: RFR: 8343507: Parallel: Fail if verify_complete finds incorrect states [v2] In-Reply-To: <7SGR3x5jFTxIVogdh8N9gyaMBS9J1HWDDV8sGlxliwc=.abd6c8b3-2157-4e55-8afa-1cf087e0302c@github.com> References: <7SGR3x5jFTxIVogdh8N9gyaMBS9J1HWDDV8sGlxliwc=.abd6c8b3-2157-4e55-8afa-1cf087e0302c@github.com> Message-ID: On Mon, 4 Nov 2024 07:55:04 GMT, Albert Mingkun Yang wrote: >> Trivial change of replacing `log_warning` with `fatal`, because incorrect `destination_count` always indicate some problem. >> >> Test: tier1-3 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21865#issuecomment-2454345797 From ayang at openjdk.org Mon Nov 4 10:34:37 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 4 Nov 2024 10:34:37 GMT Subject: Integrated: 8343507: Parallel: Fail if verify_complete finds incorrect states In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 05:31:26 GMT, Albert Mingkun Yang wrote: > Trivial change of replacing `log_warning` with `fatal`, because incorrect `destination_count` always indicate some problem. > > Test: tier1-3 This pull request has now been integrated. Changeset: 452a5fbd Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/452a5fbd9c29e0991758ab97ed5bdbf1922b6a11 Stats: 8 lines in 1 file changed: 0 ins; 4 del; 4 mod 8343507: Parallel: Fail if verify_complete finds incorrect states Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/21865 From simonis at openjdk.org Mon Nov 4 10:46:02 2024 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 4 Nov 2024 10:46:02 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers [v2] In-Reply-To: References: Message-ID: > Currently `BlockLocationPrinter::print_location()` checks for a pointer if it points into the heap and if that's true, it either prints it as an oop if `is_valid_obj()` is true or it tries to find the the start address of an oop for that pointer by calling `CollectedHeapT::heap()->block_start()`. > > However, the `block_start()` functionality is not fully implemented for all GCs (e.g. the young generation of `ParallelScavengeHeap`) and for these cases `block_start()` returns NULL. Because of this NULL return value `os::print_location()` will finally qualify the corresponding pointer as pointing "into unknown readable memory" although we already know that it actually points into an invalid heap area. > > In such cases, print at least that the pointer is pointing into an unknown part of the heap instead of just saying that it points into unknown memory. > > I've manually tested the new functionality in GDB. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Small refactoring based on tschatzl's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21870/files - new: https://git.openjdk.org/jdk/pull/21870/files/80cc0ee7..f5886102 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21870&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21870&range=00-01 Stats: 4 lines in 1 file changed: 1 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21870.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21870/head:pull/21870 PR: https://git.openjdk.org/jdk/pull/21870 From simonis at openjdk.org Mon Nov 4 11:00:31 2024 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 4 Nov 2024 11:00:31 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers [v2] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 10:03:34 GMT, Thomas Schatzl wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Small refactoring based on tschatzl's review > > src/hotspot/share/gc/shared/locationPrinter.inline.hpp line 57: > >> 55: // Check if addr points into Java heap. >> 56: if (CollectedHeapT::heap()->is_in(addr)) { >> 57: // base_oop_or_null() might be unimplemented and return NULL for some GCs/generations > > In such cases where the flag that we later set is dependent on the complete condition, it seems nicer to assign the result of the condition to it right away. That saves the assignment later too, having only a single assignment to it. Ymmv. > Suggestion: > > // Check if addr points into Java heap. > bool in_heap = CollectedHeapT::heap()->is_in(addr); > if (in_heap) { > // base_oop_or_null() might be unimplemented and return NULL for some GCs/generations. > > > (And drop the assignment to `in_heap` later). Thanks for looking at this PR. Your suggestion sound like a reasonable simplification. I've updated the code accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21870#discussion_r1827548054 From ayang at openjdk.org Mon Nov 4 11:01:40 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 4 Nov 2024 11:01:40 GMT Subject: RFR: 8339162: [REDO] JDK-8338440 Parallel: Improve fragmentation mitigation in Full GC Message-ID: This PR consists of two commits, the original and bug-fix. The original patch calculates the dest-count for the preceding live words incorrectly -- `preceding_destination` can be on region-boundary. Test: TEST=gc/TestSoftReferencesBehaviorOnOOME.java fails ~4/100 without the fix but passes with the fix. ------------- Commit messages: - fix - original Changes: https://git.openjdk.org/jdk/pull/21872/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21872&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339162 Stats: 568 lines in 2 files changed: 209 ins; 143 del; 216 mod Patch: https://git.openjdk.org/jdk/pull/21872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21872/head:pull/21872 PR: https://git.openjdk.org/jdk/pull/21872 From ayang at openjdk.org Mon Nov 4 11:01:40 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 4 Nov 2024 11:01:40 GMT Subject: RFR: 8339162: [REDO] JDK-8338440 Parallel: Improve fragmentation mitigation in Full GC In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 10:55:45 GMT, Albert Mingkun Yang wrote: > This PR consists of two commits, the original and bug-fix. > > The original patch calculates the dest-count for the preceding live words incorrectly -- `preceding_destination` can be on region-boundary. > > Test: TEST=gc/TestSoftReferencesBehaviorOnOOME.java fails ~4/100 without the fix but passes with the fix. @lgxbslgx @zhengyu123 @walulyai Could you take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21872#issuecomment-2454400845 From simonis at openjdk.org Mon Nov 4 11:03:29 2024 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 4 Nov 2024 11:03:29 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 10:01:39 GMT, Albert Mingkun Yang wrote: > > However, the block_start() functionality is not fully implemented for all GCs (e.g. the young generation of ParallelScavengeHeap) and for these cases block_start() returns NULL. > > Can we implement it properly for all gcs, instead of working around the issue in the caller? Everything is possible :) but I think it is not trivial. The problem is that we can crash at any time. In order to implement it reliably, we would have to make the heap walkable but I don't think we want to do this in the crash handler. Any suggestions? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21870#issuecomment-2454412489 From ayang at openjdk.org Mon Nov 4 11:25:28 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 4 Nov 2024 11:25:28 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 11:01:21 GMT, Volker Simonis wrote: > but I think it is not trivial. I was thinking copying the Serial impl into `ParallelScavengeHeap::block_start`; nothing sophisticated. I suspect the following oddly looking code is used to workaround the unimplemented branch of block_start. if (DebuggingContext::is_enabled() || VMError::is_error_reported()) { return nullptr; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/21870#issuecomment-2454455414 From shade at openjdk.org Mon Nov 4 12:04:29 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Nov 2024 12:04:29 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers [v2] In-Reply-To: References: Message-ID: <18YEfgfqK5a_YG8Noc_NKRFMfRYBhU8vcspBrli63CM=.d46c87e1-1e4f-4fce-90a0-4281a773bcda@github.com> On Mon, 4 Nov 2024 10:46:02 GMT, Volker Simonis wrote: >> Currently `BlockLocationPrinter::print_location()` checks for a pointer if it points into the heap and if that's true, it either prints it as an oop if `is_valid_obj()` is true or it tries to find the the start address of an oop for that pointer by calling `CollectedHeapT::heap()->block_start()`. >> >> However, the `block_start()` functionality is not fully implemented for all GCs (e.g. the young generation of `ParallelScavengeHeap`) and for these cases `block_start()` returns NULL. Because of this NULL return value `os::print_location()` will finally qualify the corresponding pointer as pointing "into unknown readable memory" although we already know that it actually points into an invalid heap area. >> >> In such cases, print at least that the pointer is pointing into an unknown part of the heap instead of just saying that it points into unknown memory. >> >> I've manually tested the new functionality in GDB. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Small refactoring based on tschatzl's review I think this is a fine papercut fix, and implementing block information for all GCs could be tackled separately. I do have a question, though: src/hotspot/share/gc/shared/locationPrinter.inline.hpp line 90: > 88: if (in_heap) { > 89: st->print_cr(PTR_FORMAT " is an unknown heap location", p2i(addr)); > 90: return true; So why not put this block as `else` branch in `base_oop_or_null` check at L67? This would also remove any ambiguity whether the in-heap pointer would look like a compressed pointer to object, which would be accidentally handled by the block at L64..L86? ------------- PR Review: https://git.openjdk.org/jdk/pull/21870#pullrequestreview-2412881446 PR Review Comment: https://git.openjdk.org/jdk/pull/21870#discussion_r1827621215 From simonis at openjdk.org Mon Nov 4 14:37:34 2024 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 4 Nov 2024 14:37:34 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers [v2] In-Reply-To: <18YEfgfqK5a_YG8Noc_NKRFMfRYBhU8vcspBrli63CM=.d46c87e1-1e4f-4fce-90a0-4281a773bcda@github.com> References: <18YEfgfqK5a_YG8Noc_NKRFMfRYBhU8vcspBrli63CM=.d46c87e1-1e4f-4fce-90a0-4281a773bcda@github.com> Message-ID: On Mon, 4 Nov 2024 12:00:38 GMT, Aleksey Shipilev wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Small refactoring based on tschatzl's review > > src/hotspot/share/gc/shared/locationPrinter.inline.hpp line 90: > >> 88: if (in_heap) { >> 89: st->print_cr(PTR_FORMAT " is an unknown heap location", p2i(addr)); >> 90: return true; > > So why not put this block as `else` branch in `base_oop_or_null` check at L67? This would also remove any ambiguity whether the in-heap pointer would look like a compressed pointer to object, which would be accidentally handled by the block at L64..L86? That was actually the first thing I did. But then I thought that (especially with zero-based compressed oops) we might get quite some valid compressed oops pointers unnecessarily printed as "unknown heap location". On the other hand, I don't think that there's a high probability for a real invalid heap pointer to be classified as compressed oops pointer because the compressed oops detection code uses `is_valid_obj()` anyway. So this change is conservative in the sense that it doesn't change any behavior except that pointers which have been printed as pointing "into unknown readable memory" can now be detect as "invalid heap pointers". If you still think we should prioritize the detection steps differently, please let me know. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21870#discussion_r1827834445 From simonis at openjdk.org Mon Nov 4 16:28:29 2024 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 4 Nov 2024 16:28:29 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers [v2] In-Reply-To: <1qZ2t5EIMp-DpFdvnOJwd5o5D4g74fbgGG4VBy5frq4=.aff55256-6f89-4fd5-9eda-3f5ed1540fa8@github.com> References: <18YEfgfqK5a_YG8Noc_NKRFMfRYBhU8vcspBrli63CM=.d46c87e1-1e4f-4fce-90a0-4281a773bcda@github.com> <1qZ2t5EIMp-DpFdvnOJwd5o5D4g74fbgGG4VBy5frq4=.aff55256-6f89-4fd5-9eda-3f5ed1540fa8@github.com> Message-ID: On Mon, 4 Nov 2024 15:52:59 GMT, Aleksey Shipilev wrote: > Oh, OK. Compressed pointers make this whole thing a bit messy. I think current code is not handling the case of compressed interior pointers all that well; IDK if we even have those in Hotspot. > Yes, that's true. I first thought about calling `BlockLocationPrinter::print_location()` recursively for the compressed oops case to avoid code duplication and get the same handling for regular and compressed oops, but that would have been a much larger change. I think we can have compressed oops in registers, e.g. when GC iterates the heap or when compiled code loads a field but the cases are probably more rare than regular oops. > I think there is an ambiguity between compressed pointers and regular pointers at this level, which we cannot reasonably resolve. E.g. if we have zero-based compressed oops with 2-bit shift and 16 GB heap, passing `0x1000000` as the `addr` here cannot distinguish between cases of "regular pointer, points to `0x1000000`" and "compressed pointer, decodes as `0x4000000`". I guess we would like to print both interpretations. But this is way beyond the scope for this PR. That's also true, but remember that `is_valid_obj()` does quite some checks. So in your example, in order to make it really ambiguous, it would require that at address `0x1000000 + 8` as well as at address `0x4000000 + 8` we have properly aligned, valid (possibly compressed) pointers into MetaSpace pointing to a valid `Klass` object (which is probably not so common for most adresses). > This version would do meanwhile. Thanks for the review. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21870#discussion_r1828017494 From shade at openjdk.org Mon Nov 4 15:55:29 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Nov 2024 15:55:29 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers [v2] In-Reply-To: References: Message-ID: <0tmSW_c-jMzTApXLMSo06DCBrjZFBLjGQAAxOYx-rS8=.1ec378eb-11e4-44fb-a34a-185c74724631@github.com> On Mon, 4 Nov 2024 10:46:02 GMT, Volker Simonis wrote: >> Currently `BlockLocationPrinter::print_location()` checks for a pointer if it points into the heap and if that's true, it either prints it as an oop if `is_valid_obj()` is true or it tries to find the the start address of an oop for that pointer by calling `CollectedHeapT::heap()->block_start()`. >> >> However, the `block_start()` functionality is not fully implemented for all GCs (e.g. the young generation of `ParallelScavengeHeap`) and for these cases `block_start()` returns NULL. Because of this NULL return value `os::print_location()` will finally qualify the corresponding pointer as pointing "into unknown readable memory" although we already know that it actually points into an invalid heap area. >> >> In such cases, print at least that the pointer is pointing into an unknown part of the heap instead of just saying that it points into unknown memory. >> >> I've manually tested the new functionality in GDB. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Small refactoring based on tschatzl's review Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21870#pullrequestreview-2413449596 From shade at openjdk.org Mon Nov 4 15:55:30 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 4 Nov 2024 15:55:30 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers [v2] In-Reply-To: References: <18YEfgfqK5a_YG8Noc_NKRFMfRYBhU8vcspBrli63CM=.d46c87e1-1e4f-4fce-90a0-4281a773bcda@github.com> Message-ID: <1qZ2t5EIMp-DpFdvnOJwd5o5D4g74fbgGG4VBy5frq4=.aff55256-6f89-4fd5-9eda-3f5ed1540fa8@github.com> On Mon, 4 Nov 2024 14:34:28 GMT, Volker Simonis wrote: >> src/hotspot/share/gc/shared/locationPrinter.inline.hpp line 90: >> >>> 88: if (in_heap) { >>> 89: st->print_cr(PTR_FORMAT " is an unknown heap location", p2i(addr)); >>> 90: return true; >> >> So why not put this block as `else` branch in `base_oop_or_null` check at L67? This would also remove any ambiguity whether the in-heap pointer would look like a compressed pointer to object, which would be accidentally handled by the block at L64..L86? > > That was actually the first thing I did. But then I thought that (especially with zero-based compressed oops) we might get quite some valid compressed oops pointers unnecessarily printed as "unknown heap location". > On the other hand, I don't think that there's a high probability for a real invalid heap pointer to be classified as compressed oops pointer because the compressed oops detection code uses `is_valid_obj()` anyway. > So this change is conservative in the sense that it doesn't change any behavior except that pointers which have been printed as pointing "into unknown readable memory" can now be detect as "invalid heap pointers". > > If you still think we should prioritize the detection steps differently, please let me know. Oh, OK. Compressed pointers make this whole thing a bit messy. I think current code is not handling the case of compressed interior pointers all that well; IDK if we even have those in Hotspot. I think there is an ambiguity between compressed pointers and regular pointers at this level, which we cannot reasonably resolve. E.g. if we have zero-based compressed oops with 2-bit shift and 16 GB heap, passing `0x1000000` as the `addr` here cannot distinguish between cases of "regular pointer, points to `0x1000000`" and "compressed pointer, decodes as `0x4000000`". I guess we would like to print both interpretations. But this is way beyond the scope for this PR. This version would do meanwhile. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21870#discussion_r1827965996 From tschatzl at openjdk.org Mon Nov 4 15:11:02 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 4 Nov 2024 15:11:02 GMT Subject: RFR: 8343189: [REDO] JDK-8295269 G1: Improve slow startup due to predictor initialization Message-ID: Hi all, please review this redo of [JDK-8295269](https://bugs.openjdk.org/browse/JDK-8295269) G1: Improve slow startup due to predictor initialization. The cause are issues with the `runtime/cds/DeterministicDump.java` test, that is currently being fixed in #21871. There has been no change in these changes. Testing: running a few thousand times with the fixed `runtime/cds/DeterministicDump.java` test Thanks, Thomas ------------- Depends on: https://git.openjdk.org/jdk/pull/21871 Commit messages: - Revert "8343086: [BACKOUT] JDK-8295269 G1: Improve slow startup due to predictor initialization" Changes: https://git.openjdk.org/jdk/pull/21876/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21876&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343189 Stats: 7 lines in 2 files changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21876.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21876/head:pull/21876 PR: https://git.openjdk.org/jdk/pull/21876 From simonis at openjdk.org Mon Nov 4 15:10:28 2024 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 4 Nov 2024 15:10:28 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 11:22:47 GMT, Albert Mingkun Yang wrote: > > but I think it is not trivial. > > I was thinking copying the Serial impl into `ParallelScavengeHeap::block_start`; nothing sophisticated. > Unfortunately, the Serial implementation doesn't really work reliably if running with `-XX:+UseTLAB` (which is the default). If called with a pointer which points into unallocated TLAB buffer, `ContiguousSpace::block_start_const()` will just crash with a SIGSEGV (or a secondary crash during error reporting when called from `VMError`): #0 0x00007ffff57d78ce in oopDesc::size_given_klass (this=0x7ffde5616c70, klass=0x7ffda2000000) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/oops/oop.inline.hpp:196 #1 0x00007ffff57d7756 in oopDesc::size (this=0x7ffde5616c70) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/oops/oop.inline.hpp:153 #2 0x00007ffff689a421 in ContiguousSpace::block_start_const (this=0x7ffff004c880, p=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/space.cpp:565 #3 0x00007ffff689b7ba in Space::block_start (this=0x7ffff004c880, p=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/space.inline.hpp:43 #4 0x00007ffff60f4144 in GenerationBlockStartClosure::do_space (this=0x7ffff530ef30, s=0x7ffff004c880) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/generation.cpp:191 #5 0x00007ffff5e560c5 in DefNewGeneration::space_iterate (this=0x7ffff004b9c0, blk=0x7ffff530ef30, usedOnly=false) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/serial/defNewGeneration.cpp:674 #6 0x00007ffff60f3527 in Generation::block_start (this=0x7ffff004b9c0, p=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/generation.cpp:200 #7 0x00007ffff60e36e9 in GenCollectedHeap::block_start (this=0x7ffff0038450, addr=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/genCollectedHeap.cpp:884 #8 0x00007ffff60e5b97 in BlockLocationPrinter::base_oop_or_null (addr=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/locationPrinter.inline.hpp:41 #9 0x00007ffff60e592b in BlockLocationPrinter::print_location (st=0x7ffff0000b60, addr=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/locationPrinter.inline.hpp:56 #10 0x00007ffff60e43bd in GenCollectedHeap::print_location (this=0x7ffff0038450, st=0x7ffff0000b60, addr=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/genCollectedHeap.cpp:1046 #11 0x00007ffff66acb22 in os::print_location (st=0x7ffff0000b60, x=140728451820704, verbose=false) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/runtime/os.cpp:1190 And that's again because the heap is in general not *walkable* when we call this function. Making it walkable will fill the remaining TLAB spaces with a dummy int array, but without that, we will just trying to interpret random memory (or NULL if running with `-XX:+ZeroTLAB`) as a `Klass` pointer which is seldomly successful :) > I suspect the following oddly looking code is used to workaround the unimplemented branch of block_start. > > ``` > if (DebuggingContext::is_enabled() || VMError::is_error_reported()) { > return nullptr; > } > ``` That "oddly looking code" is actually the proof that `block_start()` only gets called from `VMError` or manually, when natively debugging the VM. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21870#issuecomment-2454964142 From syan at openjdk.org Tue Nov 5 01:51:37 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 5 Nov 2024 01:51:37 GMT Subject: RFR: 8343490: Update copyright year for JDK-8341692 Message-ID: <2BwWuKdm5FwggsXPwo3P2xRD6CGr5QDdn3gVG5x5fo0=.41d944e6-6737-4d7d-8654-986149b41c9d@github.com> Hi all, The copyright year of some files which has been changed by [JDK-8341692](https://bugs.openjdk.org/browse/JDK-8341692) wasn't update correctly. This PR update the copyright year of [JDK-8341692](https://bugs.openjdk.org/browse/JDK-8341692). Trivial fix, no risk. ------------- Commit messages: - delete tail whitespace of test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTest.java - 8343490: Update copyright year for JDK-8341692 Changes: https://git.openjdk.org/jdk/pull/21891/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21891&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343490 Stats: 66 lines in 66 files changed: 2 ins; 0 del; 64 mod Patch: https://git.openjdk.org/jdk/pull/21891.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21891/head:pull/21891 PR: https://git.openjdk.org/jdk/pull/21891 From ayang at openjdk.org Tue Nov 5 05:38:27 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 5 Nov 2024 05:38:27 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers In-Reply-To: References: Message-ID: <9WJcuKHuAyqcP1vaCwtvsJqBWLptNyG2kFHFqp_Xl04=.bf12af6b-ad13-4a21-8259-1b19d770ec71@github.com> On Mon, 4 Nov 2024 15:08:00 GMT, Volker Simonis wrote: > And that's again because the heap is in general not walkable when we call this function. It depends on exactly when this function can be called, and with what arg. I wonder whether it can be called with a pointer to a obj that has not been properly initialized (with klass); if so, the heap is almost never walkable, since allocation is not atomic. > the Serial implementation doesn't really work reliably I am curious if other GCs' impl work (more) reliably, with regarding to the tlab example. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21870#issuecomment-2456274894 From tschatzl at openjdk.org Tue Nov 5 09:50:15 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 5 Nov 2024 09:50:15 GMT Subject: RFR: 8343189: [REDO] JDK-8295269 G1: Improve slow startup due to predictor initialization [v2] In-Reply-To: References: Message-ID: > Hi all, > > please review this redo of [JDK-8295269](https://bugs.openjdk.org/browse/JDK-8295269) G1: Improve slow startup due to predictor initialization. The cause are issues with the `runtime/cds/DeterministicDump.java` test, that is currently being fixed in #21871. > > There has been no change in these changes. > > Testing: running a few thousand times with the fixed `runtime/cds/DeterministicDump.java` test > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into 8343189-redo-slow-startup - Revert "8343086: [BACKOUT] JDK-8295269 G1: Improve slow startup due to predictor initialization" This reverts commit f1cc890ddfe2e472cf786856dc7d01645f61b054. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21876/files - new: https://git.openjdk.org/jdk/pull/21876/files/4ee7c256..f728668c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21876&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21876&range=00-01 Stats: 116898 lines in 366 files changed: 93329 ins; 7618 del; 15951 mod Patch: https://git.openjdk.org/jdk/pull/21876.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21876/head:pull/21876 PR: https://git.openjdk.org/jdk/pull/21876 From iwalulya at openjdk.org Tue Nov 5 10:09:34 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 5 Nov 2024 10:09:34 GMT Subject: RFR: 8343189: [REDO] JDK-8295269 G1: Improve slow startup due to predictor initialization [v2] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 09:50:15 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this redo of [JDK-8295269](https://bugs.openjdk.org/browse/JDK-8295269) G1: Improve slow startup due to predictor initialization. The cause are issues with the `runtime/cds/DeterministicDump.java` test, that is currently being fixed in #21871. >> >> There has been no change in these changes. >> >> Testing: running a few thousand times with the fixed `runtime/cds/DeterministicDump.java` test >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8343189-redo-slow-startup > - Revert "8343086: [BACKOUT] JDK-8295269 G1: Improve slow startup due to predictor initialization" > > This reverts commit f1cc890ddfe2e472cf786856dc7d01645f61b054. Still good! ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21876#pullrequestreview-2415173439 From aboldtch at openjdk.org Tue Nov 5 14:18:52 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 5 Nov 2024 14:18:52 GMT Subject: RFR: 8343460: ZGC: Crash in ZRemembered::scan_page_and_clear_remset Message-ID: `free_page` may concurrently delete the remset while `scan_page_and_clear_remset` is scanning the page. Move it to after the `_safe_recycle.register_and_clone_if_activated`. Doing the deletion on the new cloned page will not occur as it not old. And the registered page's remset will be deleted by the destructor when the `_safe_recycle` scope quest up the `safe_destroy`. To be able to push the deletion all the way into `prepare_to_recycle` the unnecessary use of this mechanism had to be removed. `free_pages_alloc_failed` does not need to protect the pages, as they are not yet present in the PageTable. We have simply taken them out of the cache, but failed to commit or map some memory, so we are putting these pages back into the cache. See bed9c260bbc9bd208b03d7eedd4e2cfa151b58f2 The fix works without this last commit. So we must be careful to check that these pages cannot be reached by some other means. The FoundOld bitmap iteration goes through the PageTable so even if an old page was registered, we would not find these pages. There is a scary lack of a fence between the removal of the page from the PageTable and the lock in `register_and_clone_if_activated`. The stress test will deterministically crash with this modified code 0756e0056b44ee16bee81256f556c8df981ceaf9 and using these options `-XX:+UseZGC -XX:+UseNewCode -XX:ZCollectionIntervalMinor=0.1 -XX:ZCollectionIntervalMajor=1 -XX:ZFragmentationLimit=0 -XX:-CreateCoredumpOnCrash`, and no longer does after with this patch. ------------- Commit messages: - Tie remset deletion to recycle - 8343460: ZGC: Crash in ZRemembered::scan_page_and_clear_remset Changes: https://git.openjdk.org/jdk/pull/21905/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21905&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343460 Stats: 26 lines in 2 files changed: 5 ins; 19 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21905.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21905/head:pull/21905 PR: https://git.openjdk.org/jdk/pull/21905 From jsikstro at openjdk.org Tue Nov 5 15:19:31 2024 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 5 Nov 2024 15:19:31 GMT Subject: RFR: 8343460: ZGC: Crash in ZRemembered::scan_page_and_clear_remset In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 14:10:47 GMT, Axel Boldt-Christmas wrote: > `free_page` may concurrently delete the remset while `scan_page_and_clear_remset` is scanning the page. Move it to after the `_safe_recycle.register_and_clone_if_activated`. Doing the deletion on the new cloned page will not occur as it not old. And the registered page's remset will be deleted by the destructor when the `_safe_recycle` scope quest up the `safe_destroy`. > > To be able to push the deletion all the way into `prepare_to_recycle` the unnecessary use of this mechanism had to be removed. `free_pages_alloc_failed` does not need to protect the pages, as they are not yet present in the PageTable. We have simply taken them out of the cache, but failed to commit or map some memory, so we are putting these pages back into the cache. See bed9c260bbc9bd208b03d7eedd4e2cfa151b58f2 > > The fix works without this last commit. So we must be careful to check that these pages cannot be reached by some other means. The FoundOld bitmap iteration goes through the PageTable so even if an old page was registered, we would not find these pages. > > There is a scary lack of a fence between the removal of the page from the PageTable and the lock in `register_and_clone_if_activated`. > > The stress test will deterministically crash with this modified code 0756e0056b44ee16bee81256f556c8df981ceaf9 and using these options `-XX:+UseZGC -XX:+UseNewCode -XX:ZCollectionIntervalMinor=0.1 -XX:ZCollectionIntervalMajor=1 -XX:ZFragmentationLimit=0 -XX:-CreateCoredumpOnCrash`, and no longer does after with this patch. Looks good! ------------- Marked as reviewed by jsikstro (Committer). PR Review: https://git.openjdk.org/jdk/pull/21905#pullrequestreview-2415938963 From iwalulya at openjdk.org Tue Nov 5 16:28:27 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 5 Nov 2024 16:28:27 GMT Subject: RFR: 8339162: [REDO] JDK-8338440 Parallel: Improve fragmentation mitigation in Full GC In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 10:55:45 GMT, Albert Mingkun Yang wrote: > This PR consists of two commits, the original and bug-fix. > > The original patch calculates the dest-count for the preceding live words incorrectly -- `preceding_destination` can be on region-boundary. > > Test: TEST=gc/TestSoftReferencesBehaviorOnOOME.java fails ~4/100 without the fix but passes with the fix. LGTM! Shouldn't the contributors in the original be added to this redo? ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21872#pullrequestreview-2416131096 From stuefe at openjdk.org Tue Nov 5 16:40:54 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 5 Nov 2024 16:40:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v53] In-Reply-To: References: Message-ID: <5EgL-mJp75JLOxEccrrGVxbfS6QdUywRSfsOcgx4zl8=.3c283bf3-3e2e-4fe2-bce5-c30d7d4e2da4@github.com> On Thu, 24 Oct 2024 21:04:51 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Enable riscv in CompressedClassPointersEncodingScheme test Went again through all the changes, with focus on runtime code. Still good. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2416155892 From amitkumar at openjdk.org Tue Nov 5 16:49:01 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 5 Nov 2024 16:49:01 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v50] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 16:43:35 GMT, Roman Kennke wrote: >Hi Amit, sorry I only now get to reply to this, I have been traveling. What does the change do? Is it critical? Would it be possible to fix it after I intergrated the JEP? Because any change that I do now invalidates existing reviews, and might delay integration, and we're already running pretty close to RDP1. If at all possible, I would prefer to take it after I intergrated the JEP - we can have fixes well after RDP1, but not new features. If you agree, then please file a follow-up issue. That's perfectly fine. I will do it with separate RFE :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2457680086 From rkennke at openjdk.org Tue Nov 5 16:49:01 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 5 Nov 2024 16:49:01 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v50] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 16:22:20 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update copyright >> - Avoid assert/endless-loop in JFR code > > @egahlin / @mgronlun could you please review the JFR parts of this PR? One change is for getting the right prototype header, the other is for avoiding an endless loop/assert in a corner case. > @rkennke can you include this small update for s390x as well: > > ```diff > diff --git a/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp b/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp > index 0f7e5c9f457..476e3d5daa4 100644 > --- a/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp > +++ b/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp > @@ -174,8 +174,11 @@ void C1_MacroAssembler::try_allocate( > void C1_MacroAssembler::initialize_header(Register obj, Register klass, Register len, Register Rzero, Register t1) { > assert_different_registers(obj, klass, len, t1, Rzero); > if (UseCompactObjectHeaders) { > - z_lg(t1, Address(klass, in_bytes(Klass::prototype_header_offset()))); > - z_stg(t1, Address(obj, oopDesc::mark_offset_in_bytes())); > + z_mvc( > + Address(obj, oopDesc::mark_offset_in_bytes()), /* move to */ > + Address(klass, in_bytes(Klass::prototype_header_offset())), /* move from */ > + sizeof(markWord) /* how much to move */ > + ); > } else { > load_const_optimized(t1, (intx)markWord::prototype().value()); > z_stg(t1, Address(obj, oopDesc::mark_offset_in_bytes())); > diff --git a/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp b/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp > index 378d5e4cfe1..c5713161bf9 100644 > --- a/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp > +++ b/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp > @@ -46,7 +46,7 @@ void C2_MacroAssembler::load_narrow_klass_compact_c2(Register dst, Address src) > // The incoming address is pointing into obj-start + klass_offset_in_bytes. We need to extract > // obj-start, so that we can load from the object's mark-word instead. > z_lg(dst, src.plus_disp(-oopDesc::klass_offset_in_bytes())); > - z_srlg(dst, dst, markWord::klass_shift); // TODO: could be z_sra > + z_srlg(dst, dst, markWord::klass_shift); > } > > //------------------------------------------------------ > diff --git a/src/hotspot/cpu/s390/templateTable_s390.cpp b/src/hotspot/cpu/s390/templateTable_s390.cpp > index 3cb1aba810d..5b8f7a20478 100644 > --- a/src/hotspot/cpu/s390/templateTable_s390.cpp > +++ b/src/hotspot/cpu/s390/templateTable_s390.cpp > @@ -3980,8 +3980,11 @@ void TemplateTable::_new() { > // Initialize object header only. > __ bind(initialize_header); > if (UseCompactObjectHeaders) { > - __ z_lg(tmp, Address(iklass, in_bytes(Klass::prototype_header_offset()))); > - __ z_stg(tmp, Address(RallocatedObject, oopDesc::mark_offset_in_bytes())); > + __ z_mvc( > + Address(RallocatedObject, oopDesc::mark_offset_in_bytes()), // move to > + Address(iklass, in_bytes(Klass::prototype_header_offset())), // move from > + sizeof(markWord) // how much to move > + ); > } else { > __ store_const(Address(RallocatedObject, oopDesc::mark_offset_in_bytes()), > (long) markWord::prototype().value()); > ``` Hi Amit, sorry I only now get to reply to this, I have been traveling. What does the change do? Is it critical? Would it be possible to fix it after I intergrated the JEP? Because any change that I do now invalidates existing reviews, and might delay integration, and we're already running pretty close to RDP1. If at all possible, I would prefer to take it after I intergrated the JEP - we can have fixes well after RDP1, but not new features. If you agree, then please file a follow-up issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2457674486 From kvn at openjdk.org Tue Nov 5 18:28:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Nov 2024 18:28:33 GMT Subject: RFR: 8343173: Remove ZGC-specific non-JVMCI test groups [v2] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 16:13:53 GMT, Leonid Mesnik wrote: >> The JVMCI should be supported by all GCs and specific >> hotspot_compiler_all_gcs >> group is not needed anymore. >> >> There are few failures of JVMCI tests with ZGC happened, the bug >> https://bugs.openjdk.org/browse/JDK-8343233 >> is filed and corresponding tests are problemlisted. > > Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - typo fixed > - Merge branch 'master' of https://github.com/openjdk/jdk into 8343173 > - 8343173: Remove ZGC-specific non-JVMCI test groups Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21774#pullrequestreview-2416399409 From rkennke at openjdk.org Tue Nov 5 20:00:31 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 5 Nov 2024 20:00:31 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v54] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 104 commits: - Merge tag 'jdk-24+22' into JDK-8305895-v4 Added tag jdk-24+22 for changeset 388d44fb - Enable riscv in CompressedClassPointersEncodingScheme test - s390 port - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test - Update copyright - Avoid assert/endless-loop in JFR code - Update copyright headers - Merge tag 'jdk-24+20' into JDK-8305895-v4 Added tag jdk-24+20 for changeset 7a64fbbb - Fix needle copying in indexOf intrinsic for smaller headers - Compact header riscv (#3) Implement compact headers on RISCV --------- Co-authored-by: hamlin - ... and 94 more: https://git.openjdk.org/jdk/compare/388d44fb...b945822a ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=53 Stats: 5214 lines in 218 files changed: 3587 ins; 864 del; 763 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From lmesnik at openjdk.org Tue Nov 5 20:55:35 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 5 Nov 2024 20:55:35 GMT Subject: Integrated: 8343173: Remove ZGC-specific non-JVMCI test groups In-Reply-To: References: Message-ID: <-1bZpI933zmujmTibsiiOkDdxnlxnKEGVGAPlqfvYik=.a0981eca-c8da-466c-a209-b266afea8513@github.com> On Tue, 29 Oct 2024 22:01:08 GMT, Leonid Mesnik wrote: > The JVMCI should be supported by all GCs and specific > hotspot_compiler_all_gcs > group is not needed anymore. > > There are few failures of JVMCI tests with ZGC happened, the bug > https://bugs.openjdk.org/browse/JDK-8343233 > is filed and corresponding tests are problemlisted. This pull request has now been integrated. Changeset: 847cc5eb Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/847cc5ebac43b83746d8f238c5f9ecf2972a2796 Stats: 12 lines in 2 files changed: 8 ins; 4 del; 0 mod 8343173: Remove ZGC-specific non-JVMCI test groups Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/21774 From gli at openjdk.org Wed Nov 6 03:54:28 2024 From: gli at openjdk.org (Guoxiong Li) Date: Wed, 6 Nov 2024 03:54:28 GMT Subject: RFR: 8339162: [REDO] JDK-8338440 Parallel: Improve fragmentation mitigation in Full GC In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 10:55:45 GMT, Albert Mingkun Yang wrote: > This PR consists of two commits, the original and bug-fix. > > The original patch calculates the dest-count for the preceding live words incorrectly -- `preceding_destination` can be on region-boundary. > > Test: TEST=gc/TestSoftReferencesBehaviorOnOOME.java fails ~4/100 without the fix but passes with the fix. Looks good. Nice found. The `region_align_up` is not the `next_region_start_address`. ------------- Marked as reviewed by gli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21872#pullrequestreview-2417193301 From gli at openjdk.org Wed Nov 6 04:01:33 2024 From: gli at openjdk.org (Guoxiong Li) Date: Wed, 6 Nov 2024 04:01:33 GMT Subject: RFR: 8339162: [REDO] JDK-8338440 Parallel: Improve fragmentation mitigation in Full GC In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 03:51:55 GMT, Guoxiong Li wrote: > The `region_align_up` is not the `next_region_start_address`. Even an experienced developer would misuse the function `region_align_up`, it may be good to add comment (in another PR?) to `region_align_up` to clarify its usage. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21872#issuecomment-2458679167 From ayang at openjdk.org Wed Nov 6 08:10:06 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 6 Nov 2024 08:10:06 GMT Subject: RFR: 8343658: Parallel: Implement block_start for Young generation Message-ID: Simple block_start implementation for Parallel young-gen. Related to https://github.com/openjdk/jdk/pull/21870 Test: tier1-3 ------------- Commit messages: - pgc-block-start Changes: https://git.openjdk.org/jdk/pull/21919/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21919&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343658 Stats: 28 lines in 3 files changed: 24 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21919.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21919/head:pull/21919 PR: https://git.openjdk.org/jdk/pull/21919 From rkennke at openjdk.org Wed Nov 6 09:13:46 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 6 Nov 2024 09:13:46 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v55] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix gen-ZGC removal ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/b945822a..1ea4de16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=54 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=53-54 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From stuefe at openjdk.org Wed Nov 6 09:13:47 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 6 Nov 2024 09:13:47 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v50] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 16:43:35 GMT, Roman Kennke wrote: >> @egahlin / @mgronlun could you please review the JFR parts of this PR? One change is for getting the right prototype header, the other is for avoiding an endless loop/assert in a corner case. > >> @rkennke can you include this small update for s390x as well: >> >> ```diff >> diff --git a/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp b/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp >> index 0f7e5c9f457..476e3d5daa4 100644 >> --- a/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp >> +++ b/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp >> @@ -174,8 +174,11 @@ void C1_MacroAssembler::try_allocate( >> void C1_MacroAssembler::initialize_header(Register obj, Register klass, Register len, Register Rzero, Register t1) { >> assert_different_registers(obj, klass, len, t1, Rzero); >> if (UseCompactObjectHeaders) { >> - z_lg(t1, Address(klass, in_bytes(Klass::prototype_header_offset()))); >> - z_stg(t1, Address(obj, oopDesc::mark_offset_in_bytes())); >> + z_mvc( >> + Address(obj, oopDesc::mark_offset_in_bytes()), /* move to */ >> + Address(klass, in_bytes(Klass::prototype_header_offset())), /* move from */ >> + sizeof(markWord) /* how much to move */ >> + ); >> } else { >> load_const_optimized(t1, (intx)markWord::prototype().value()); >> z_stg(t1, Address(obj, oopDesc::mark_offset_in_bytes())); >> diff --git a/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp b/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp >> index 378d5e4cfe1..c5713161bf9 100644 >> --- a/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp >> +++ b/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp >> @@ -46,7 +46,7 @@ void C2_MacroAssembler::load_narrow_klass_compact_c2(Register dst, Address src) >> // The incoming address is pointing into obj-start + klass_offset_in_bytes. We need to extract >> // obj-start, so that we can load from the object's mark-word instead. >> z_lg(dst, src.plus_disp(-oopDesc::klass_offset_in_bytes())); >> - z_srlg(dst, dst, markWord::klass_shift); // TODO: could be z_sra >> + z_srlg(dst, dst, markWord::klass_shift); >> } >> >> //------------------------------------------------------ >> diff --git a/src/hotspot/cpu/s390/templateTable_s390.cpp b/src/hotspot/cpu/s390/templateTable_s390.cpp >> index 3cb1aba810d..5b8f7a20478 100644 >> --- a/src/hotspot/cpu/s390/templateTable_s390.cpp >> +++ b/src/hotspot/cpu/s390/templateTable_s390.cpp >> @@ -3980,8 +3980,11 @@ void TemplateTable::_new() { >> // Initialize object header only. >> __ bind(initialize_header); >> if (UseCompactObjectHeaders) { >> - __ z_lg(tmp, Address(iklass, in_bytes(Klass::prototype_header_offset()))); >> - __ z_stg(tmp, Address(RallocatedObject, oo... Merge is good. @rkennke patch for the new test errors due to removal of non-generational ZGC: https://gist.github.com/tstuefe/321b769d3b281198b767b68e18bb7271 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2459069232 From simonis at openjdk.org Wed Nov 6 14:17:32 2024 From: simonis at openjdk.org (Volker Simonis) Date: Wed, 6 Nov 2024 14:17:32 GMT Subject: RFR: 8343658: Parallel: Implement block_start for Young generation In-Reply-To: References: Message-ID: <0JdxCwM0B41HAGRr1A5ch4WdP8Gmi2uAd2mgjqW5gNM=.9a485a86-bce3-4a41-b5fb-e17f48992182@github.com> On Wed, 6 Nov 2024 08:04:17 GMT, Albert Mingkun Yang wrote: > Simple block_start implementation for Parallel young-gen. Related to https://github.com/openjdk/jdk/pull/21870 > > Test: tier1-3 src/hotspot/share/gc/parallel/mutableSpace.cpp line 239: > 237: > 238: HeapWord* cur_addr = bottom(); > 239: while (cur_addr <= addr) { As already described in https://github.com/openjdk/jdk/pull/21870#issuecomment-2454964142, this will not work in the general case, if the heap is not walkable. In a debug build you'll run into the assertion once you arrive in the unallcoated TLAB area (which you don't want during error reporting). Even worse, in the product build you can crash or run this loop infinitely, depending on what data `obj->size()` will find in the unallocated TLAB space. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21919#discussion_r1831091387 From simonis at openjdk.org Wed Nov 6 14:24:31 2024 From: simonis at openjdk.org (Volker Simonis) Date: Wed, 6 Nov 2024 14:24:31 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers In-Reply-To: <9WJcuKHuAyqcP1vaCwtvsJqBWLptNyG2kFHFqp_Xl04=.bf12af6b-ad13-4a21-8259-1b19d770ec71@github.com> References: <9WJcuKHuAyqcP1vaCwtvsJqBWLptNyG2kFHFqp_Xl04=.bf12af6b-ad13-4a21-8259-1b19d770ec71@github.com> Message-ID: On Tue, 5 Nov 2024 05:35:57 GMT, Albert Mingkun Yang wrote: >>> > but I think it is not trivial. >>> >>> I was thinking copying the Serial impl into `ParallelScavengeHeap::block_start`; nothing sophisticated. >>> >> >> Unfortunately, the Serial implementation doesn't really work reliably if running with `-XX:+UseTLAB` (which is the default). If called with a pointer which points into unallocated TLAB buffer, `ContiguousSpace::block_start_const()` will just crash with a SIGSEGV (or a secondary crash during error reporting when called from `VMError`): >> >> #0 0x00007ffff57d78ce in oopDesc::size_given_klass (this=0x7ffde5616c70, klass=0x7ffda2000000) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/oops/oop.inline.hpp:196 >> #1 0x00007ffff57d7756 in oopDesc::size (this=0x7ffde5616c70) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/oops/oop.inline.hpp:153 >> #2 0x00007ffff689a421 in ContiguousSpace::block_start_const (this=0x7ffff004c880, p=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/space.cpp:565 >> #3 0x00007ffff689b7ba in Space::block_start (this=0x7ffff004c880, p=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/space.inline.hpp:43 >> #4 0x00007ffff60f4144 in GenerationBlockStartClosure::do_space (this=0x7ffff530ef30, s=0x7ffff004c880) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/generation.cpp:191 >> #5 0x00007ffff5e560c5 in DefNewGeneration::space_iterate (this=0x7ffff004b9c0, blk=0x7ffff530ef30, usedOnly=false) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/serial/defNewGeneration.cpp:674 >> #6 0x00007ffff60f3527 in Generation::block_start (this=0x7ffff004b9c0, p=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/generation.cpp:200 >> #7 0x00007ffff60e36e9 in GenCollectedHeap::block_start (this=0x7ffff0038450, addr=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/genCollectedHeap.cpp:884 >> #8 0x00007ffff60e5b97 in BlockLocationPrinter::base_oop_or_null (addr=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/locationPrinter.inline.hpp:41 >> #9 0x00007ffff60e592b in BlockLocationPrinter::print_location (st=0x7ffff0000b60, addr=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/locationPrinter.inline.hpp:56 >> #10 0x00007ffff60e43bd in GenCollectedHeap::print_location (this=0x7ffff0038450, st=0x7ffff0000b60, a... > >> And that's again because the heap is in general not walkable when we call this function. > > It depends on exactly when this function can be called, and with what arg. I wonder whether it can be called with a pointer to a obj that has not been properly initialized (with klass); if so, the heap is almost never walkable, since allocation is not atomic. > >> the Serial implementation doesn't really work reliably > > I am curious if other GCs' impl work (more) reliably, with regarding to the tlab example. @albertnetymk, I'm fine with further improving error reporting by implementing a more sophisticated version of `block_start()` (and I saw you already started a try in #21919). But I still think this patch is a good improvement over the current situation and it will be still valuable for the corner cases which an improved `block_start()` wont be able to handle. So if you don't have any objections, I plan to push this in a day or so. @tschatzl are you fine with the latest version of this patch (I think I've addressed your suggestions)? If you don't have any additionl objections, I'll push this PR in a day or so. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21870#issuecomment-2459884161 From tschatzl at openjdk.org Wed Nov 6 14:35:36 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 6 Nov 2024 14:35:36 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers [v2] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 10:46:02 GMT, Volker Simonis wrote: >> Currently `BlockLocationPrinter::print_location()` checks for a pointer if it points into the heap and if that's true, it either prints it as an oop if `is_valid_obj()` is true or it tries to find the the start address of an oop for that pointer by calling `CollectedHeapT::heap()->block_start()`. >> >> However, the `block_start()` functionality is not fully implemented for all GCs (e.g. the young generation of `ParallelScavengeHeap`) and for these cases `block_start()` returns NULL. Because of this NULL return value `os::print_location()` will finally qualify the corresponding pointer as pointing "into unknown readable memory" although we already know that it actually points into an invalid heap area. >> >> In such cases, print at least that the pointer is pointing into an unknown part of the heap instead of just saying that it points into unknown memory. >> >> I've manually tested the new functionality in GDB. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Small refactoring based on tschatzl's review Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21870#pullrequestreview-2418445867 From ayang at openjdk.org Wed Nov 6 18:10:34 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 6 Nov 2024 18:10:34 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers [v2] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 10:46:02 GMT, Volker Simonis wrote: >> Currently `BlockLocationPrinter::print_location()` checks for a pointer if it points into the heap and if that's true, it either prints it as an oop if `is_valid_obj()` is true or it tries to find the the start address of an oop for that pointer by calling `CollectedHeapT::heap()->block_start()`. >> >> However, the `block_start()` functionality is not fully implemented for all GCs (e.g. the young generation of `ParallelScavengeHeap`) and for these cases `block_start()` returns NULL. Because of this NULL return value `os::print_location()` will finally qualify the corresponding pointer as pointing "into unknown readable memory" although we already know that it actually points into an invalid heap area. >> >> In such cases, print at least that the pointer is pointing into an unknown part of the heap instead of just saying that it points into unknown memory. >> >> I've manually tested the new functionality in GDB. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Small refactoring based on tschatzl's review I think this is good on its own. ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21870#pullrequestreview-2419079944 From ayang at openjdk.org Wed Nov 6 18:11:29 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 6 Nov 2024 18:11:29 GMT Subject: RFR: 8343658: Parallel: Implement block_start for Young generation In-Reply-To: <0JdxCwM0B41HAGRr1A5ch4WdP8Gmi2uAd2mgjqW5gNM=.9a485a86-bce3-4a41-b5fb-e17f48992182@github.com> References: <0JdxCwM0B41HAGRr1A5ch4WdP8Gmi2uAd2mgjqW5gNM=.9a485a86-bce3-4a41-b5fb-e17f48992182@github.com> Message-ID: <2DQOnQWFzESExHcKdsf2xdbDw46GDuyYLEQ_OTEHpc8=.e96b5363-65f2-4f73-afe3-85f13d928c62@github.com> On Wed, 6 Nov 2024 14:14:14 GMT, Volker Simonis wrote: > this will not work in the general case, if the heap is not walkable. True, but this is the best-effort approach used in other GCs, as far as I can tell. Is there a real use case that warrants a more sophisticated variant? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21919#discussion_r1831500873 From stuefe at openjdk.org Thu Nov 7 10:50:48 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 7 Nov 2024 10:50:48 GMT Subject: RFR: 8343658: Parallel: Implement block_start for Young generation In-Reply-To: <2DQOnQWFzESExHcKdsf2xdbDw46GDuyYLEQ_OTEHpc8=.e96b5363-65f2-4f73-afe3-85f13d928c62@github.com> References: <0JdxCwM0B41HAGRr1A5ch4WdP8Gmi2uAd2mgjqW5gNM=.9a485a86-bce3-4a41-b5fb-e17f48992182@github.com> <2DQOnQWFzESExHcKdsf2xdbDw46GDuyYLEQ_OTEHpc8=.e96b5363-65f2-4f73-afe3-85f13d928c62@github.com> Message-ID: On Wed, 6 Nov 2024 18:08:43 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/parallel/mutableSpace.cpp line 239: >> >>> 237: >>> 238: HeapWord* cur_addr = bottom(); >>> 239: while (cur_addr <= addr) { >> >> As already described in https://github.com/openjdk/jdk/pull/21870#issuecomment-2454964142, this will not work in the general case, if the heap is not walkable. In a debug build you'll run into the assertion once you arrive in the unallcoated TLAB area (which you don't want during error reporting). >> Even worse, in the product build you can crash or run this loop infinitely, depending on what data `obj->size()` will find in the unallocated TLAB space. > >> this will not work in the general case, if the heap is not walkable. > > True, but this is the best-effort approach used in other GCs, as far as I can tell. Is there a real use case that warrants a more sophisticated variant? What would be nice would be something like `oopDesc::safe_klass_or_null()` or similar, feeding into a corresponding `oopDesc::size_given_klass_safe_or_0()`. The former would check the klass word for validity before dereferencing - `CompressedKlassPointers::is_encodable(p)` and then the load of layouthelper etc should happen with SafeFetch. Alternatively (and a bit more unsafe), check the readability of Klass* with SafeFetch beforehand, then call normal size_given_klass. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21919#discussion_r1832464222 From simonis at openjdk.org Thu Nov 7 12:13:47 2024 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 7 Nov 2024 12:13:47 GMT Subject: Integrated: 8343531: Improve print_location for invalid heap pointers In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 09:43:18 GMT, Volker Simonis wrote: > Currently `BlockLocationPrinter::print_location()` checks for a pointer if it points into the heap and if that's true, it either prints it as an oop if `is_valid_obj()` is true or it tries to find the the start address of an oop for that pointer by calling `CollectedHeapT::heap()->block_start()`. > > However, the `block_start()` functionality is not fully implemented for all GCs (e.g. the young generation of `ParallelScavengeHeap`) and for these cases `block_start()` returns NULL. Because of this NULL return value `os::print_location()` will finally qualify the corresponding pointer as pointing "into unknown readable memory" although we already know that it actually points into an invalid heap area. > > In such cases, print at least that the pointer is pointing into an unknown part of the heap instead of just saying that it points into unknown memory. > > I've manually tested the new functionality in GDB. This pull request has now been integrated. Changeset: f0b251d7 Author: Volker Simonis URL: https://git.openjdk.org/jdk/commit/f0b251d76078e8d5b47e967b0449c4cbdcb5a005 Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod 8343531: Improve print_location for invalid heap pointers Reviewed-by: shade, tschatzl, ayang ------------- PR: https://git.openjdk.org/jdk/pull/21870 From simonis at openjdk.org Thu Nov 7 12:13:47 2024 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 7 Nov 2024 12:13:47 GMT Subject: RFR: 8343531: Improve print_location for invalid heap pointers In-Reply-To: <9WJcuKHuAyqcP1vaCwtvsJqBWLptNyG2kFHFqp_Xl04=.bf12af6b-ad13-4a21-8259-1b19d770ec71@github.com> References: <9WJcuKHuAyqcP1vaCwtvsJqBWLptNyG2kFHFqp_Xl04=.bf12af6b-ad13-4a21-8259-1b19d770ec71@github.com> Message-ID: On Tue, 5 Nov 2024 05:35:57 GMT, Albert Mingkun Yang wrote: >>> > but I think it is not trivial. >>> >>> I was thinking copying the Serial impl into `ParallelScavengeHeap::block_start`; nothing sophisticated. >>> >> >> Unfortunately, the Serial implementation doesn't really work reliably if running with `-XX:+UseTLAB` (which is the default). If called with a pointer which points into unallocated TLAB buffer, `ContiguousSpace::block_start_const()` will just crash with a SIGSEGV (or a secondary crash during error reporting when called from `VMError`): >> >> #0 0x00007ffff57d78ce in oopDesc::size_given_klass (this=0x7ffde5616c70, klass=0x7ffda2000000) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/oops/oop.inline.hpp:196 >> #1 0x00007ffff57d7756 in oopDesc::size (this=0x7ffde5616c70) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/oops/oop.inline.hpp:153 >> #2 0x00007ffff689a421 in ContiguousSpace::block_start_const (this=0x7ffff004c880, p=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/space.cpp:565 >> #3 0x00007ffff689b7ba in Space::block_start (this=0x7ffff004c880, p=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/space.inline.hpp:43 >> #4 0x00007ffff60f4144 in GenerationBlockStartClosure::do_space (this=0x7ffff530ef30, s=0x7ffff004c880) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/generation.cpp:191 >> #5 0x00007ffff5e560c5 in DefNewGeneration::space_iterate (this=0x7ffff004b9c0, blk=0x7ffff530ef30, usedOnly=false) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/serial/defNewGeneration.cpp:674 >> #6 0x00007ffff60f3527 in Generation::block_start (this=0x7ffff004b9c0, p=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/generation.cpp:200 >> #7 0x00007ffff60e36e9 in GenCollectedHeap::block_start (this=0x7ffff0038450, addr=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/genCollectedHeap.cpp:884 >> #8 0x00007ffff60e5b97 in BlockLocationPrinter::base_oop_or_null (addr=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/locationPrinter.inline.hpp:41 >> #9 0x00007ffff60e592b in BlockLocationPrinter::print_location (st=0x7ffff0000b60, addr=0x7ffde5616ca0) at /priv/simonisv/OpenJDK/Git/jdk21u-dev/src/hotspot/share/gc/shared/locationPrinter.inline.hpp:56 >> #10 0x00007ffff60e43bd in GenCollectedHeap::print_location (this=0x7ffff0038450, st=0x7ffff0000b60, a... > >> And that's again because the heap is in general not walkable when we call this function. > > It depends on exactly when this function can be called, and with what arg. I wonder whether it can be called with a pointer to a obj that has not been properly initialized (with klass); if so, the heap is almost never walkable, since allocation is not atomic. > >> the Serial implementation doesn't really work reliably > > I am curious if other GCs' impl work (more) reliably, with regarding to the tlab example. Thanks @albertnetymk, @tschatzl and @shipilev for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21870#issuecomment-2462078214 From simonis at openjdk.org Thu Nov 7 12:40:43 2024 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 7 Nov 2024 12:40:43 GMT Subject: RFR: 8343658: Parallel: Implement block_start for Young generation In-Reply-To: References: <0JdxCwM0B41HAGRr1A5ch4WdP8Gmi2uAd2mgjqW5gNM=.9a485a86-bce3-4a41-b5fb-e17f48992182@github.com> <2DQOnQWFzESExHcKdsf2xdbDw46GDuyYLEQ_OTEHpc8=.e96b5363-65f2-4f73-afe3-85f13d928c62@github.com> Message-ID: <-3P95fiUgh8CavaWr6qd1uMHTGqq4KrzbyO7YfIkCZc=.cf1aad7c-77b0-4151-9a91-7676216dbecd@github.com> On Thu, 7 Nov 2024 10:48:25 GMT, Thomas Stuefe wrote: >>> this will not work in the general case, if the heap is not walkable. >> >> True, but this is the best-effort approach used in other GCs, as far as I can tell. Is there a real use case that warrants a more sophisticated variant? > > What would be nice would be something like `oopDesc::safe_klass_or_null()` or similar, feeding into a corresponding `oopDesc::size_given_klass_safe_or_0()`. The former would check the klass word for validity before dereferencing - `CompressedKlassPointers::is_encodable(p)` and then the load of layouthelper etc should happen with SafeFetch. Alternatively (and a bit more unsafe), check the readability of Klass* with SafeFetch beforehand, then call normal size_given_klass. > > this will not work in the general case, if the heap is not walkable. > > True, but this is the best-effort approach used in other GCs, as far as I can tell. Is there a real use case that warrants a more sophisticated variant? The **only** use case for this code during hs_err reporting for heap-addresses not pointing at the beginning of an oop. I think we should be conservative here, because a secondary crash will cut the information available in the hs_err file and will therefor do more harm then being helpful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21919#discussion_r1832612499 From simonis at openjdk.org Thu Nov 7 12:48:43 2024 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 7 Nov 2024 12:48:43 GMT Subject: RFR: 8343658: Parallel: Implement block_start for Young generation In-Reply-To: <-3P95fiUgh8CavaWr6qd1uMHTGqq4KrzbyO7YfIkCZc=.cf1aad7c-77b0-4151-9a91-7676216dbecd@github.com> References: <0JdxCwM0B41HAGRr1A5ch4WdP8Gmi2uAd2mgjqW5gNM=.9a485a86-bce3-4a41-b5fb-e17f48992182@github.com> <2DQOnQWFzESExHcKdsf2xdbDw46GDuyYLEQ_OTEHpc8=.e96b5363-65f2-4f73-afe3-85f13d928c62@github.com> <-3P95fiUgh8CavaWr6qd1uMHTGqq4KrzbyO7YfIkCZc=.cf1aad7c-77b0-4151-9a91-7676216dbecd@github.com> Message-ID: On Thu, 7 Nov 2024 12:38:03 GMT, Volker Simonis wrote: >> What would be nice would be something like `oopDesc::safe_klass_or_null()` or similar, feeding into a corresponding `oopDesc::size_given_klass_safe_or_0()`. The former would check the klass word for validity before dereferencing - `CompressedKlassPointers::is_encodable(p)` and then the load of layouthelper etc should happen with SafeFetch. Alternatively (and a bit more unsafe), check the readability of Klass* with SafeFetch beforehand, then call normal size_given_klass. > >> > this will not work in the general case, if the heap is not walkable. >> >> True, but this is the best-effort approach used in other GCs, as far as I can tell. Is there a real use case that warrants a more sophisticated variant? > > The **only** use case for this code during hs_err reporting for heap-addresses not pointing at the beginning of an oop. I think we should be conservative here, because a secondary crash will cut the information available in the hs_err file and will therefor do more harm then being helpful. > What would be nice would be something like `oopDesc::safe_klass_or_null()` or similar, feeding into a corresponding `oopDesc::size_given_klass_safe_or_0()`. The former would check the klass word for validity before dereferencing - `CompressedKlassPointers::is_encodable(p)` and then the load of layouthelper etc should happen with SafeFetch. Alternatively (and a bit more unsafe), check the readability of Klass* with SafeFetch beforehand, then call normal size_given_klass. We already have [LocationPrinter::is_valid_obj()](https://github.com/openjdk/jdk/blob/ac82a8f89c7066fb1d379b12bcfd68053cb39ba4/src/hotspot/share/gc/shared/locationPrinter.cpp#L33) which uses [Klass::is_valid()](https://github.com/openjdk/jdk/blob/ac82a8f89c7066fb1d379b12bcfd68053cb39ba4/src/hotspot/share/oops/klass.cpp#L1038) to check the validity of an oop. I don't think we need `SafeFetch` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21919#discussion_r1832623326 From thomas.stuefe at gmail.com Thu Nov 7 13:51:40 2024 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 7 Nov 2024 14:51:40 +0100 Subject: ParallelGC, large old generation when optimizing for footprint goal Message-ID: Hi, I have a question about some odd behavior I observe when ParallelGC optimizes for footprint. If I omit giving a pause time goal and relax the throughput goal enough, the JVM should optimize for the footprint goal. But if the JVM was started with a small young gen (e.g. because the initial heap size was small), it seems to go into a tailspin where the young gen stays tiny or even shrinks more and more, resulting in lots of promotions, old gen grows until it hits the ceiling, Full GC, then the cycle repeats. That maximizes RAM use and thus runs counter to the footprint goal. Example: I run heapothesys/hyperalloc [1] with JDK 21. I run with an allocation pressure of 512MB/sec and a live set size of 128MB. `java -Xlog:gc* -Xmx8g -Xms512m -XX:+UseParallelGC -XX:GCTimeRatio=1 -jar ./target/HyperAlloc-1.0.jar -h 8192 -a 512 -s 128` One can observe how young gen starts at ?150MB, shrinks to ?60MB, and old gen grows till it hits the ceiling at ?5.5GB. Increasing the initial heap size mitigates the problem: Eden still shrinks but settles at a larger size. We still get very frequent young GCs, though. Ironically, the problem is more likely on containers with little RAM. Eden size depends on initial heap size, which depends on total RAM (even if -Xmx was set). Little RAM -> tiny Eden. Therefore, less RAM can cause the JVM to use more memory. That behavior can easily be observed with different values for MaxRAM: calling above program with -XX:MaxRAM=10g will cause the JVM to enter the tailspin immediately, the process peaks at >5GB RSS. The same program with -XX:MaxRAM=128g causes the process to use just ~1.2GB RSS since the young gen stays sensibly large and thus total heap size never grows that much. I looked into the tuning guide [2] but did not find information about how exactly the footprint goal is reached. For ParallelGC, it just states: "Footprint: The maximum heap footprint is specified using the option -Xmx. In addition, the collector has an implicit goal of minimizing the size of the heap as long as the other goals are being met." which looks to me like it should work with default settings, out of the box. Am I making a thinking error somewhere? Is this a bug or is this behavior expected? Thank you, Thomas [1] https://github.com/corretto/heapothesys/tree/master/HyperAlloc [2] https://docs.oracle.com/en/java/javase/11/gctuning/parallel-collector1.html#GUID-DCDD6E46-0406-41D1-AB49-FB96A50EB9CE -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkennke at openjdk.org Thu Nov 7 16:58:36 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 7 Nov 2024 16:58:36 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v56] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 106 commits: - Merge tag 'jdk-25+23' into JDK-8305895-v4 Added tag jdk-24+23 for changeset c0e6c3b9 - Fix gen-ZGC removal - Merge tag 'jdk-24+22' into JDK-8305895-v4 Added tag jdk-24+22 for changeset 388d44fb - Enable riscv in CompressedClassPointersEncodingScheme test - s390 port - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test - Update copyright - Avoid assert/endless-loop in JFR code - Update copyright headers - Merge tag 'jdk-24+20' into JDK-8305895-v4 Added tag jdk-24+20 for changeset 7a64fbbb - ... and 96 more: https://git.openjdk.org/jdk/compare/c0e6c3b9...4d282247 ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=55 Stats: 5212 lines in 218 files changed: 3585 ins; 864 del; 763 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Thu Nov 7 17:25:40 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 7 Nov 2024 17:25:40 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: - Merge branch 'master' into JDK-8305895-v4 - Merge tag 'jdk-25+23' into JDK-8305895-v4 Added tag jdk-24+23 for changeset c0e6c3b9 - Fix gen-ZGC removal - Merge tag 'jdk-24+22' into JDK-8305895-v4 Added tag jdk-24+22 for changeset 388d44fb - Enable riscv in CompressedClassPointersEncodingScheme test - s390 port - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test - Update copyright - Avoid assert/endless-loop in JFR code - Update copyright headers - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=56 Stats: 5212 lines in 218 files changed: 3585 ins; 864 del; 763 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Thu Nov 7 17:33:11 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 7 Nov 2024 17:33:11 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b I'd like to prepare for integration now. I merged from master and resolved some conflicts. I am now running at least tier1 on aarch64 x x86_64 x -UCOH x +UCOH, possibly tier2 .. 4, too (time permitting). In the meantime, could you please re-approve the PR? I hope it doesn't catch any more conflicts until we're ready for intergration. As soon as the JEP is targeted (sometime today, I think), tests are clean and approvals are there, I would like to integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2462834035 From coleenp at openjdk.org Thu Nov 7 17:46:11 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 7 Nov 2024 17:46:11 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Reapproving. Please wait for GHA to complete, when JEP is targeted to integrate. Thanks! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2421741026 From stefank at openjdk.org Thu Nov 7 17:53:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 7 Nov 2024 17:53:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Marked as reviewed by stefank (Reviewer). Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2417620293 PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2421753879 From rkennke at openjdk.org Thu Nov 7 21:27:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 7 Nov 2024 21:27:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: <2xoAD2r5G_6IHT9gt8-uSkN_hPiRmIkJ6VhkB1GarfI=.4e3c65db-3aab-4926-b1fc-fc78599b2885@github.com> On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b GHA failures look like one unrelated timeout and one unrelated infra problem. Please confirm. I also run tier1 on x86_64 x aarch64 x -UCOH x + UCOH, with nothing sticking out (same timeout observed, though). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2463245179 From stuefe at openjdk.org Fri Nov 8 07:02:51 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 8 Nov 2024 07:02:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Merge looks good. build errors on MacOS unrelated. ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2422830379 From albert.m.yang at oracle.com Fri Nov 8 09:08:51 2024 From: albert.m.yang at oracle.com (Albert Yang) Date: Fri, 8 Nov 2024 09:08:51 +0000 Subject: ParallelGC, large old generation when optimizing for footprint goal In-Reply-To: References: Message-ID: > I run with an allocation pressure of 512MB/sec If the alloc-rate is 512M/s and init-heap-size is 512M, it's indeed expected that young-gc is frequent -- the default eden-size is ~150M. > One can observe how young gen starts at ?150MB, shrinks to ?60MB, and old gen grows till it hits the ceiling at ?5.5GB. This is definitely undesirable, and as you put it, "runs counter to the footprint goal". I have been working on JDK-8338977, and the current prototype maintains heap-capacity under ~600M. Thank you for providing this bm (and the config); I will include the result for this bm when I send out the PR. /Albert ________________________________________ From: hotspot-gc-dev on behalf of Thomas St?fe Sent: Thursday, November 7, 2024 14:51 To: hotspot-gc-dev at openjdk.java.net Subject: ParallelGC, large old generation when optimizing for footprint goal Hi, I have a question about some odd behavior I observe when ParallelGC optimizes for footprint. If I omit giving a pause time goal and relax the throughput goal enough, the JVM should optimize for the footprint goal. But if the JVM was started with a small young gen (e.g. because the initial heap size was small), it seems to go into a tailspin where the young gen stays tiny or even shrinks more and more, resulting in lots of promotions, old gen grows until it hits the ceiling, Full GC, then the cycle repeats. That maximizes RAM use and thus runs counter to the footprint goal. Example: I run heapothesys/hyperalloc [1] with JDK 21. I run with an allocation pressure of 512MB/sec and a live set size of 128MB. `java -Xlog:gc* -Xmx8g -Xms512m -XX:+UseParallelGC -XX:GCTimeRatio=1 -jar ./target/HyperAlloc-1.0.jar -h 8192 -a 512 -s 128` One can observe how young gen starts at ?150MB, shrinks to ?60MB, and old gen grows till it hits the ceiling at ?5.5GB. Increasing the initial heap size mitigates the problem: Eden still shrinks but settles at a larger size. We still get very frequent young GCs, though. Ironically, the problem is more likely on containers with little RAM. Eden size depends on initial heap size, which depends on total RAM (even if -Xmx was set). Little RAM -> tiny Eden. Therefore, less RAM can cause the JVM to use more memory. That behavior can easily be observed with different values for MaxRAM: calling above program with -XX:MaxRAM=10g will cause the JVM to enter the tailspin immediately, the process peaks at >5GB RSS. The same program with -XX:MaxRAM=128g causes the process to use just ~1.2GB RSS since the young gen stays sensibly large and thus total heap size never grows that much. I looked into the tuning guide [2] but did not find information about how exactly the footprint goal is reached. For ParallelGC, it just states: "Footprint: The maximum heap footprint is specified using the option -Xmx. In addition, the collector has an implicit goal of minimizing the size of the heap as long as the other goals are being met." which looks to me like it should work with default settings, out of the box. Am I making a thinking error somewhere? Is this a bug or is this behavior expected? Thank you, Thomas [1] https://github.com/corretto/heapothesys/tree/master/HyperAlloc [2] https://docs.oracle.com/en/java/javase/11/gctuning/parallel-collector1.html#GUID-DCDD6E46-0406-41D1-AB49-FB96A50EB9CE From sjohanss at openjdk.org Fri Nov 8 09:22:41 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 8 Nov 2024 09:22:41 GMT Subject: RFR: 8343189: [REDO] JDK-8295269 G1: Improve slow startup due to predictor initialization [v2] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 09:50:15 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this redo of [JDK-8295269](https://bugs.openjdk.org/browse/JDK-8295269) G1: Improve slow startup due to predictor initialization. The cause are issues with the `runtime/cds/DeterministicDump.java` test, that is currently being fixed in #21871. >> >> There has been no change in these changes. >> >> Testing: running a few thousand times with the fixed `runtime/cds/DeterministicDump.java` test >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8343189-redo-slow-startup > - Revert "8343086: [BACKOUT] JDK-8295269 G1: Improve slow startup due to predictor initialization" > > This reverts commit f1cc890ddfe2e472cf786856dc7d01645f61b054. Marked as reviewed by sjohanss (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21876#pullrequestreview-2423166378 From tschatzl at openjdk.org Fri Nov 8 09:46:39 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 8 Nov 2024 09:46:39 GMT Subject: RFR: 8343189: [REDO] JDK-8295269 G1: Improve slow startup due to predictor initialization [v2] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:06:34 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into 8343189-redo-slow-startup >> - Revert "8343086: [BACKOUT] JDK-8295269 G1: Improve slow startup due to predictor initialization" >> >> This reverts commit f1cc890ddfe2e472cf786856dc7d01645f61b054. > > Still good! Thanks @walulyai @kstefanj for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21876#issuecomment-2464252609 From tschatzl at openjdk.org Fri Nov 8 09:46:39 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 8 Nov 2024 09:46:39 GMT Subject: Integrated: 8343189: [REDO] JDK-8295269 G1: Improve slow startup due to predictor initialization In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 15:06:04 GMT, Thomas Schatzl wrote: > Hi all, > > please review this redo of [JDK-8295269](https://bugs.openjdk.org/browse/JDK-8295269) G1: Improve slow startup due to predictor initialization. The cause are issues with the `runtime/cds/DeterministicDump.java` test, that is currently being fixed in #21871. > > There has been no change in these changes. > > Testing: running a few thousand times with the fixed `runtime/cds/DeterministicDump.java` test > > Thanks, > Thomas This pull request has now been integrated. Changeset: c7f071cf Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/c7f071cf36a6f064e293e82e7e5bb0abcc76ad70 Stats: 7 lines in 2 files changed: 6 ins; 0 del; 1 mod 8343189: [REDO] JDK-8295269 G1: Improve slow startup due to predictor initialization Reviewed-by: iwalulya, sjohanss ------------- PR: https://git.openjdk.org/jdk/pull/21876 From stuefe at openjdk.org Fri Nov 8 16:10:56 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 8 Nov 2024 16:10:56 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Marked as reviewed by stuefe (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2424199289 From phh at openjdk.org Fri Nov 8 16:15:14 2024 From: phh at openjdk.org (Paul Hohensee) Date: Fri, 8 Nov 2024 16:15:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2424210008 From stefank at openjdk.org Fri Nov 8 16:26:28 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 8 Nov 2024 16:26:28 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2424260100 From coleenp at openjdk.org Fri Nov 8 16:26:28 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 8 Nov 2024 16:26:28 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Still looks good. Nice work! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2424274474 From tschatzl at openjdk.org Fri Nov 8 16:56:05 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 8 Nov 2024 16:56:05 GMT Subject: RFR: 8297692: Avoid sending per-region GCPhaseParallel JFR events in G1ScanCollectionSetRegionClosure Message-ID: <0vG-VYZ2aKoCjTB6bxD8aTbqfB7LotSmJBL1LHrcLw8=.5cb24f1f-09aa-44e3-81e0-90badc70ee10@github.com> Hi all, please review this change that significantly reduces the amount of "Code Roots" and "Optional Roots" JFR events to reduce default recording sizes significantly. E.g. a 10min BigRamTester run creates a 23MB recording without this change, with like hundreds of thousands of these events (#gcs * #gc threads * #regions in collection set). With this change, the recording is reduced to 4MB (#gcs * #gc threads) Testing: gha, tier1-3 Thanks, Thomas ------------- Commit messages: - Update src/hotspot/share/gc/g1/g1RemSet.cpp - 8297692 Changes: https://git.openjdk.org/jdk/pull/21984/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21984&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297692 Stats: 162 lines in 3 files changed: 84 ins; 52 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/21984.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21984/head:pull/21984 PR: https://git.openjdk.org/jdk/pull/21984 From tschatzl at openjdk.org Fri Nov 8 16:56:05 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 8 Nov 2024 16:56:05 GMT Subject: RFR: 8297692: Avoid sending per-region GCPhaseParallel JFR events in G1ScanCollectionSetRegionClosure In-Reply-To: <0vG-VYZ2aKoCjTB6bxD8aTbqfB7LotSmJBL1LHrcLw8=.5cb24f1f-09aa-44e3-81e0-90badc70ee10@github.com> References: <0vG-VYZ2aKoCjTB6bxD8aTbqfB7LotSmJBL1LHrcLw8=.5cb24f1f-09aa-44e3-81e0-90badc70ee10@github.com> Message-ID: On Fri, 8 Nov 2024 15:20:21 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that significantly reduces the amount of "Code Roots" and "Optional Roots" JFR events to reduce default recording sizes significantly. > > E.g. a 10min BigRamTester run creates a 23MB recording without this change, with like hundreds of thousands of these events (#gcs * #gc threads * #regions in collection set). With this change, the recording is reduced to 4MB (#gcs * #gc threads) > > Testing: gha, tier1-3 > > Thanks, > Thomas src/hotspot/share/gc/g1/g1RemSet.cpp line 842: > 840: _opt_refs_scanned(0), > 841: _opt_refs_memory_used(0) { } > 842: Suggestion: _opt_refs_memory_used(0) { } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21984#discussion_r1834729833 From rkennke at openjdk.org Fri Nov 8 17:24:05 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 8 Nov 2024 17:24:05 GMT Subject: Integrated: 8305895: Implement JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 13:35:08 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... This pull request has now been integrated. Changeset: 44ec501a Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/44ec501a41f4794259dd03cd168838e79334890e Stats: 5212 lines in 218 files changed: 3585 ins; 864 del; 763 mod 8305895: Implement JEP 450: Compact Object Headers (Experimental) Co-authored-by: Sandhya Viswanathan Co-authored-by: Martin Doerr Co-authored-by: Hamlin Li Co-authored-by: Thomas Stuefe Co-authored-by: Amit Kumar Co-authored-by: Stefan Karlsson Co-authored-by: Coleen Phillimore Co-authored-by: Axel Boldt-Christmas Reviewed-by: coleenp, stefank, stuefe, phh, ihse, lmesnik, tschatzl, matsaave, rcastanedalo, vpaprotski, yzheng, egahlin ------------- PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Fri Nov 8 17:45:40 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 8 Nov 2024 17:45:40 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:22:34 GMT, Yudi Zheng wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - CompressedKlassPointers::is_encodable shall be callable with -UseCCP >> - Johan review feedback > > Could you please cherry pick https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 for the JVMCI support? @mur47x111 it's now intergrated in jdk24. do your magic in Graal ;-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2465413222 From yzheng at openjdk.org Fri Nov 8 17:52:05 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 8 Nov 2024 17:52:05 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 17:42:24 GMT, Roman Kennke wrote: >> Could you please cherry pick https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 for the JVMCI support? > > @mur47x111 it's now intergrated in jdk24. do your magic in Graal ;-) @rkennke It is in the merge queue ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2465423342 From tschatzl at openjdk.org Fri Nov 8 20:01:16 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 8 Nov 2024 20:01:16 GMT Subject: RFR: 8297692: Avoid sending per-region GCPhaseParallel JFR events in G1ScanCollectionSetRegionClosure In-Reply-To: <0vG-VYZ2aKoCjTB6bxD8aTbqfB7LotSmJBL1LHrcLw8=.5cb24f1f-09aa-44e3-81e0-90badc70ee10@github.com> References: <0vG-VYZ2aKoCjTB6bxD8aTbqfB7LotSmJBL1LHrcLw8=.5cb24f1f-09aa-44e3-81e0-90badc70ee10@github.com> Message-ID: On Fri, 8 Nov 2024 15:20:21 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that significantly reduces the amount of "Code Roots" and "Optional Roots" JFR events to reduce default recording sizes significantly. > > E.g. a 10min BigRamTester run creates a 23MB recording without this change, with like hundreds of thousands of these events (#gcs * #gc threads * #regions in collection set). With this change, the recording is reduced to 4MB (#gcs * #gc threads) > > Testing: gha, tier1-3 > > Thanks, > Thomas Fwiw, I went with splitting code root scan and optional root scan into two iterations that are each bracketed by a single per-thread JFR event now. This also allowed a minor optimization: in the initial evacuation there can be no optional roots, so that iteration over all regions can be skipped. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21984#issuecomment-2465653093 From ayang at openjdk.org Sat Nov 9 10:51:47 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sat, 9 Nov 2024 10:51:47 GMT Subject: RFR: 8339162: [REDO] JDK-8338440 Parallel: Improve fragmentation mitigation in Full GC [v2] In-Reply-To: References: Message-ID: > This PR consists of two commits, the original and bug-fix. > > The original patch calculates the dest-count for the preceding live words incorrectly -- `preceding_destination` can be on region-boundary. > > Test: TEST=gc/TestSoftReferencesBehaviorOnOOME.java fails ~4/100 without the fix but passes with the fix. Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' into pgc-redo - fix - original ------------- Changes: https://git.openjdk.org/jdk/pull/21872/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21872&range=01 Stats: 568 lines in 2 files changed: 209 ins; 143 del; 216 mod Patch: https://git.openjdk.org/jdk/pull/21872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21872/head:pull/21872 PR: https://git.openjdk.org/jdk/pull/21872 From zgu at openjdk.org Sat Nov 9 15:56:21 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Sat, 9 Nov 2024 15:56:21 GMT Subject: RFR: 8339162: [REDO] JDK-8338440 Parallel: Improve fragmentation mitigation in Full GC [v2] In-Reply-To: References: Message-ID: On Sat, 9 Nov 2024 10:51:47 GMT, Albert Mingkun Yang wrote: >> This PR consists of two commits, the original and bug-fix. >> >> The original patch calculates the dest-count for the preceding live words incorrectly -- `preceding_destination` can be on region-boundary. >> >> Test: TEST=gc/TestSoftReferencesBehaviorOnOOME.java fails ~4/100 without the fix but passes with the fix. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge branch 'master' into pgc-redo > - fix > - original LGTM ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21872#pullrequestreview-2425467810 From ayang at openjdk.org Sun Nov 10 11:06:36 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sun, 10 Nov 2024 11:06:36 GMT Subject: Integrated: 8339162: [REDO] JDK-8338440 Parallel: Improve fragmentation mitigation in Full GC In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 10:55:45 GMT, Albert Mingkun Yang wrote: > This PR consists of two commits, the original and bug-fix. > > The original patch calculates the dest-count for the preceding live words incorrectly -- `preceding_destination` can be on region-boundary. > > Test: TEST=gc/TestSoftReferencesBehaviorOnOOME.java fails ~4/100 without the fix but passes with the fix. This pull request has now been integrated. Changeset: 423e8e09 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/423e8e0999f53aa0bf95a7505a771dab3dd5c8d6 Stats: 568 lines in 2 files changed: 209 ins; 143 del; 216 mod 8339162: [REDO] JDK-8338440 Parallel: Improve fragmentation mitigation in Full GC Co-authored-by: Guoxiong Li Reviewed-by: zgu, iwalulya, gli ------------- PR: https://git.openjdk.org/jdk/pull/21872 From ayang at openjdk.org Sun Nov 10 11:06:35 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sun, 10 Nov 2024 11:06:35 GMT Subject: RFR: 8339162: [REDO] JDK-8338440 Parallel: Improve fragmentation mitigation in Full GC [v2] In-Reply-To: References: Message-ID: On Sat, 9 Nov 2024 10:51:47 GMT, Albert Mingkun Yang wrote: >> This PR consists of two commits, the original and bug-fix. >> >> The original patch calculates the dest-count for the preceding live words incorrectly -- `preceding_destination` can be on region-boundary. >> >> Test: TEST=gc/TestSoftReferencesBehaviorOnOOME.java fails ~4/100 without the fix but passes with the fix. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge branch 'master' into pgc-redo > - fix > - original Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21872#issuecomment-2466688932 From tschatzl at openjdk.org Mon Nov 11 10:06:04 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 11 Nov 2024 10:06:04 GMT Subject: RFR: 8343929: Remove PreservedMarksSet::createTask() after JDK-8305895 Message-ID: Hi all, please review this trivial cleanup after pushing the Compact Object Header JEP (JDK-8305895). The method mentioned is unused. Testing: gha, local compilation Thanks, Thomas ------------- Commit messages: - 8343929 Changes: https://git.openjdk.org/jdk/pull/22006/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22006&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343929 Stats: 6 lines in 2 files changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22006/head:pull/22006 PR: https://git.openjdk.org/jdk/pull/22006 From ayang at openjdk.org Mon Nov 11 10:18:41 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 11 Nov 2024 10:18:41 GMT Subject: RFR: 8343929: Remove PreservedMarksSet::createTask() after JDK-8305895 In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 10:01:56 GMT, Thomas Schatzl wrote: > Hi all, > > please review this trivial cleanup after pushing the Compact Object Header JEP (JDK-8305895). > > The method mentioned is unused. > > Testing: gha, local compilation > > Thanks, > Thomas Trivial. ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22006#pullrequestreview-2426727355 From ayang at openjdk.org Mon Nov 11 10:43:28 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 11 Nov 2024 10:43:28 GMT Subject: RFR: 8343658: Parallel: Implement block_start for Young generation In-Reply-To: References: <0JdxCwM0B41HAGRr1A5ch4WdP8Gmi2uAd2mgjqW5gNM=.9a485a86-bce3-4a41-b5fb-e17f48992182@github.com> <2DQOnQWFzESExHcKdsf2xdbDw46GDuyYLEQ_OTEHpc8=.e96b5363-65f2-4f73-afe3-85f13d928c62@github.com> <-3P95fiUgh8CavaWr6qd1uMHTGqq4KrzbyO7YfIkCZc=.cf1aad7c-77b0-4151-9a91-7676216dbecd@github.com> Message-ID: <7TCNU0SJQumJksp5tdVaXFpz6sGNkgH3_O-j2Q3TRpI=.cdcbcdd3-8d25-4d52-8ac9-f2de55b0f99f@github.com> On Thu, 7 Nov 2024 12:46:25 GMT, Volker Simonis wrote: >>> > this will not work in the general case, if the heap is not walkable. >>> >>> True, but this is the best-effort approach used in other GCs, as far as I can tell. Is there a real use case that warrants a more sophisticated variant? >> >> The **only** use case for this code during hs_err reporting for heap-addresses not pointing at the beginning of an oop. I think we should be conservative here, because a secondary crash will cut the information available in the hs_err file and will therefor do more harm then being helpful. > >> What would be nice would be something like `oopDesc::safe_klass_or_null()` or similar, feeding into a corresponding `oopDesc::size_given_klass_safe_or_0()`. The former would check the klass word for validity before dereferencing - `CompressedKlassPointers::is_encodable(p)` and then the load of layouthelper etc should happen with SafeFetch. Alternatively (and a bit more unsafe), check the readability of Klass* with SafeFetch beforehand, then call normal size_given_klass. > > We already have [LocationPrinter::is_valid_obj()](https://github.com/openjdk/jdk/blob/ac82a8f89c7066fb1d379b12bcfd68053cb39ba4/src/hotspot/share/gc/shared/locationPrinter.cpp#L33) which uses [Klass::is_valid()](https://github.com/openjdk/jdk/blob/ac82a8f89c7066fb1d379b12bcfd68053cb39ba4/src/hotspot/share/oops/klass.cpp#L1038) to check the validity of an oop. I don't think we need `SafeFetch` here. > I think we should be conservative here, because a secondary crash will cut the information available in the hs_err file Have you ever seen "a secondary crash" in practice (for other GCs)? I am a bit concerned that we might be adding complex code that is never exercised. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21919#discussion_r1836428995 From shade at openjdk.org Mon Nov 11 10:50:17 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 11 Nov 2024 10:50:17 GMT Subject: RFR: 8343929: Remove PreservedMarksSet::createTask() after JDK-8305895 In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 10:01:56 GMT, Thomas Schatzl wrote: > Hi all, > > please review this trivial cleanup after pushing the Compact Object Header JEP (JDK-8305895). > > The method mentioned is unused. > > Testing: gha, local compilation > > Thanks, > Thomas Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22006#pullrequestreview-2426929910 From tschatzl at openjdk.org Mon Nov 11 11:34:55 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 11 Nov 2024 11:34:55 GMT Subject: RFR: 8343929: Remove PreservedMarksSet::createTask() after JDK-8305895 In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 10:16:33 GMT, Albert Mingkun Yang wrote: >> Hi all, >> >> please review this trivial cleanup after pushing the Compact Object Header JEP (JDK-8305895). >> >> The method mentioned is unused. >> >> Testing: gha, local compilation >> >> Thanks, >> Thomas > > Trivial. Thanks @albertnetymk @shipilev for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/22006#issuecomment-2467951109 From tschatzl at openjdk.org Mon Nov 11 11:34:56 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 11 Nov 2024 11:34:56 GMT Subject: Integrated: 8343929: Remove PreservedMarksSet::createTask() after JDK-8305895 In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 10:01:56 GMT, Thomas Schatzl wrote: > Hi all, > > please review this trivial cleanup after pushing the Compact Object Header JEP (JDK-8305895). > > The method mentioned is unused. > > Testing: gha, local compilation > > Thanks, > Thomas This pull request has now been integrated. Changeset: 36e12955 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/36e12955b2129f2075a203a0b39198f256083a24 Stats: 6 lines in 2 files changed: 0 ins; 6 del; 0 mod 8343929: Remove PreservedMarksSet::createTask() after JDK-8305895 Reviewed-by: ayang, shade ------------- PR: https://git.openjdk.org/jdk/pull/22006 From iwalulya at openjdk.org Mon Nov 11 15:35:30 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 11 Nov 2024 15:35:30 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions Message-ID: Hi all, Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. Testing: Mach5 Tier1-6 ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) ------------- Commit messages: - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - add logging - more cleanups - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - cleanup - remove MarkingSkipEvents - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - revamp with retained regions added to groups Changes: https://git.openjdk.org/jdk/pull/22015/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343782 Stats: 969 lines in 19 files changed: 438 ins; 269 del; 262 mod Patch: https://git.openjdk.org/jdk/pull/22015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22015/head:pull/22015 PR: https://git.openjdk.org/jdk/pull/22015 From aboldtch at openjdk.org Tue Nov 12 12:57:27 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 12 Nov 2024 12:57:27 GMT Subject: RFR: 8343460: ZGC: Crash in ZRemembered::scan_page_and_clear_remset [v2] In-Reply-To: References: Message-ID: > `free_page` may concurrently delete the remset while `scan_page_and_clear_remset` is scanning the page. Move it to after the `_safe_recycle.register_and_clone_if_activated`. Doing the deletion on the new cloned page will not occur as it not old. And the registered page's remset will be deleted by the destructor when the `_safe_recycle` scope quest up the `safe_destroy`. > > To be able to push the deletion all the way into `prepare_to_recycle` the unnecessary use of this mechanism had to be removed. `free_pages_alloc_failed` does not need to protect the pages, as they are not yet present in the PageTable. We have simply taken them out of the cache, but failed to commit or map some memory, so we are putting these pages back into the cache. See bed9c260bbc9bd208b03d7eedd4e2cfa151b58f2 > > The fix works without this last commit. So we must be careful to check that these pages cannot be reached by some other means. The FoundOld bitmap iteration goes through the PageTable so even if an old page was registered, we would not find these pages. > > There is a scary lack of a fence between the removal of the page from the PageTable and the lock in `register_and_clone_if_activated`. > > The stress test will deterministically crash with this modified code 0756e0056b44ee16bee81256f556c8df981ceaf9 and using these options `-XX:+UseZGC -XX:+UseNewCode -XX:ZCollectionIntervalMinor=0.1 -XX:ZCollectionIntervalMajor=1 -XX:ZFragmentationLimit=0 -XX:-CreateCoredumpOnCrash`, and no longer does after with this patch. Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: - Add comment about prepare_to_recycle - Revert recycle_page call, still update last_used ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21905/files - new: https://git.openjdk.org/jdk/pull/21905/files/bed9c260..5e1042dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21905&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21905&range=00-01 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21905.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21905/head:pull/21905 PR: https://git.openjdk.org/jdk/pull/21905 From eosterlund at openjdk.org Tue Nov 12 13:04:32 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 12 Nov 2024 13:04:32 GMT Subject: RFR: 8343460: ZGC: Crash in ZRemembered::scan_page_and_clear_remset [v2] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 12:57:27 GMT, Axel Boldt-Christmas wrote: >> `free_page` may concurrently delete the remset while `scan_page_and_clear_remset` is scanning the page. Move it to after the `_safe_recycle.register_and_clone_if_activated`. Doing the deletion on the new cloned page will not occur as it not old. And the registered page's remset will be deleted by the destructor when the `_safe_recycle` scope quest up the `safe_destroy`. >> >> To be able to push the deletion all the way into `prepare_to_recycle` the unnecessary use of this mechanism had to be removed. `free_pages_alloc_failed` does not need to protect the pages, as they are not yet present in the PageTable. We have simply taken them out of the cache, but failed to commit or map some memory, so we are putting these pages back into the cache. See bed9c260bbc9bd208b03d7eedd4e2cfa151b58f2 >> >> The fix works without this last commit. So we must be careful to check that these pages cannot be reached by some other means. The FoundOld bitmap iteration goes through the PageTable so even if an old page was registered, we would not find these pages. >> >> There is a scary lack of a fence between the removal of the page from the PageTable and the lock in `register_and_clone_if_activated`. >> >> The stress test will deterministically crash with this modified code 0756e0056b44ee16bee81256f556c8df981ceaf9 and using these options `-XX:+UseZGC -XX:+UseNewCode -XX:ZCollectionIntervalMinor=0.1 -XX:ZCollectionIntervalMajor=1 -XX:ZFragmentationLimit=0 -XX:-CreateCoredumpOnCrash`, and no longer does after with this patch. > > Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: > > - Add comment about prepare_to_recycle > - Revert recycle_page call, still update last_used Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21905#pullrequestreview-2429533628 From stefank at openjdk.org Tue Nov 12 13:33:14 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 12 Nov 2024 13:33:14 GMT Subject: RFR: 8343460: ZGC: Crash in ZRemembered::scan_page_and_clear_remset [v2] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 12:57:27 GMT, Axel Boldt-Christmas wrote: >> `free_page` may concurrently delete the remset while `scan_page_and_clear_remset` is scanning the page. Move it to after the `_safe_recycle.register_and_clone_if_activated`. Doing the deletion on the new cloned page will not occur as it not old. And the registered page's remset will be deleted by the destructor when the `_safe_recycle` scope quest up the `safe_destroy`. >> >> To be able to push the deletion all the way into `prepare_to_recycle` the unnecessary use of this mechanism had to be removed. `free_pages_alloc_failed` does not need to protect the pages, as they are not yet present in the PageTable. We have simply taken them out of the cache, but failed to commit or map some memory, so we are putting these pages back into the cache. See bed9c260bbc9bd208b03d7eedd4e2cfa151b58f2 >> >> The fix works without this last commit. So we must be careful to check that these pages cannot be reached by some other means. The FoundOld bitmap iteration goes through the PageTable so even if an old page was registered, we would not find these pages. >> >> There is a scary lack of a fence between the removal of the page from the PageTable and the lock in `register_and_clone_if_activated`. >> >> The stress test will deterministically crash with this modified code 0756e0056b44ee16bee81256f556c8df981ceaf9 and using these options `-XX:+UseZGC -XX:+UseNewCode -XX:ZCollectionIntervalMinor=0.1 -XX:ZCollectionIntervalMajor=1 -XX:ZFragmentationLimit=0 -XX:-CreateCoredumpOnCrash`, and no longer does after with this patch. > > Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: > > - Add comment about prepare_to_recycle > - Revert recycle_page call, still update last_used Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21905#pullrequestreview-2429602801 From wkemper at openjdk.org Tue Nov 12 17:32:00 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 12 Nov 2024 17:32:00 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread Message-ID: Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. ------------- Commit messages: - Check for safepoint when stopping (stopping thread is java thread) - Fix ridiculous typo - Merge remote-tracking branch 'jdk/master' into shen-uncommit-thread - Fix shutdown protocol - Take heap lock when uncommitting bitmaps, uncommit thread joins STS. - Little bit of cleanup - WIP: checkpoint before sync up - WIP: checkpoint Changes: https://git.openjdk.org/jdk/pull/22019/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342444 Stats: 319 lines in 6 files changed: 229 ins; 74 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/22019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22019/head:pull/22019 PR: https://git.openjdk.org/jdk/pull/22019 From wkemper at openjdk.org Tue Nov 12 17:32:00 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 12 Nov 2024 17:32:00 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 17:31:58 GMT, William Kemper wrote: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. I modified the testing pipelines to set `-Xms4g -Xmx10g -XX:+ShenandoahUncommit`. All performance and stress tests completed successfully on x86 and aarch64. Marking this as ready for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22019#issuecomment-2471152157 From kdnilsen at openjdk.org Tue Nov 12 17:32:00 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 12 Nov 2024 17:32:00 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 17:31:58 GMT, William Kemper wrote: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 555: > 553: while (region != nullptr) { > 554: { > 555: ShenandoahHeapLocker locker(heap->lock()); Was it a bug that previous version of this code did not acquire the heap lock? Is the lock required for the entirety of time that we are clearing the bitmap? Or is it just required to get a trustworthy check on is_bitmap_slice_committed()? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1837047649 From wkemper at openjdk.org Tue Nov 12 17:32:01 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 12 Nov 2024 17:32:01 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 18:26:29 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 555: >> >>> 553: while (region != nullptr) { >>> 554: { >>> 555: ShenandoahHeapLocker locker(heap->lock()); >> >> Was it a bug that previous version of this code did not acquire the heap lock? >> >> Is the lock required for the entirety of time that we are clearing the bitmap? Or is it just required to get a trustworthy check on is_bitmap_slice_committed()? > > After reading more of this PR, I believe we need the heap lock to get a reliable signal of bitmap_slice_committed(). But I believe we do not need the heap lock for ctx->clear_bitmap(region) so would prefer to move that outside the lock, unless I am misunderstanding. Hmm, I'm not sure we can do that. Prior to this change, the control thread performed both clearing the bitmap and uncommitting the region's bitmap, so they could never happen concurrently. With this change, a separate thread could perform the uncommit. Consider: 1. Control thread takes heap lock, observes that bitmap slice for region A is committed 2. Control thread releases heap lock, begins clearing bitmap (writing zeros to bitmap slice) 3. Uncommit thread takes heap lock, believes it must uncommit region A 4. Uncommit thread uncommits bitmap slice for region A 5. Segfault in Control Thread I do _believe_ if we had a per region lock, it would be useful here. Holding a lock over the entire heap for this feels like overkill. Or, we could schedule the uncommit so that it does not occur during a GC cycle. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1837215468 From kdnilsen at openjdk.org Tue Nov 12 17:32:01 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 12 Nov 2024 17:32:01 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 18:19:33 GMT, Kelvin Nilsen wrote: >> Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 555: > >> 553: while (region != nullptr) { >> 554: { >> 555: ShenandoahHeapLocker locker(heap->lock()); > > Was it a bug that previous version of this code did not acquire the heap lock? > > Is the lock required for the entirety of time that we are clearing the bitmap? Or is it just required to get a trustworthy check on is_bitmap_slice_committed()? After reading more of this PR, I believe we need the heap lock to get a reliable signal of bitmap_slice_committed(). But I believe we do not need the heap lock for ctx->clear_bitmap(region) so would prefer to move that outside the lock, unless I am misunderstanding. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1837053704 From shade at openjdk.org Tue Nov 12 18:20:59 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 12 Nov 2024 18:20:59 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 17:31:58 GMT, William Kemper wrote: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. Cursory review: src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 91: > 89: #include "runtime/stackWatermarkSet.hpp" > 90: #include "runtime/vmThread.hpp" > 91: #include "gc/shenandoah/shenandoahUncommitThread.hpp" Includes are usually in alpbabetical order :) src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1345: > 1343: if (_uncommit_thread != nullptr) { > 1344: tcl->do_thread(_uncommit_thread); > 1345: } New line after new block? src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1932: > 1930: if (_uncommit_thread != nullptr) { > 1931: _uncommit_thread->stop(); > 1932: } Are there limits on proper sequencing here? Can we shutdown uncommit thread before cancelling the GC and waiting for control thread to exit? This would save end-to-end time for short commands, as we would hide the uncommit thread shutdown in the shadow of control thread exiting. src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.cpp line 55: > 53: > 54: if (soft_max_changed || explicit_gc_requested || current - last_shrink_time > shrink_period) { > 55: double shrink_before = (soft_max_changed || explicit_gc_requested) ? current : current - ((double) ShenandoahUncommitDelay / 1000.0); Suggestion: double shrink_before = (soft_max_changed || explicit_gc_requested) ? current : current - ((double) ShenandoahUncommitDelay / 1000.0); src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.cpp line 69: > 67: MonitorLocker locker(&_lock, Mutex::_no_safepoint_check_flag); > 68: if (!_stop_requested.is_set()) { > 69: locker.wait((int64_t )shrink_period); Suggestion: locker.wait((int64_t)shrink_period); src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.cpp line 139: > 137: _heap->notify_heap_changed(); > 138: double elapsed = os::elapsedTime() - start; > 139: log_info(gc)("Uncommitted " SIZE_FORMAT " regions, in %.3fs", count, elapsed); If we can, can we match the current log format? E.g. print `Concurrent uncommit`, with appropriately formatted timestamp? I think we also want `log_info(gc,start)` at the beginning of the method. I think `ShenandoahConcurrentPhase` helper did all that, can we still use it? src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.hpp line 37: > 35: ShenandoahSharedFlag _explicit_gc_requested; > 36: ShenandoahSharedFlag _stop_requested; > 37: Monitor _lock; Which one of these can be `const`? ------------- PR Review: https://git.openjdk.org/jdk/pull/22019#pullrequestreview-2430298661 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1838524990 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1838527316 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1838530829 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1838568788 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1838569201 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1838537328 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1838569638 From shade at openjdk.org Tue Nov 12 18:20:59 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 12 Nov 2024 18:20:59 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 21:18:41 GMT, William Kemper wrote: >> After reading more of this PR, I believe we need the heap lock to get a reliable signal of bitmap_slice_committed(). But I believe we do not need the heap lock for ctx->clear_bitmap(region) so would prefer to move that outside the lock, unless I am misunderstanding. > > Hmm, I'm not sure we can do that. Prior to this change, the control thread performed both clearing the bitmap and uncommitting the region's bitmap, so they could never happen concurrently. With this change, a separate thread could perform the uncommit. Consider: > > 1. Control thread takes heap lock, observes that bitmap slice for region A is committed > 2. Control thread releases heap lock, begins clearing bitmap (writing zeros to bitmap slice) > 3. Uncommit thread takes heap lock, believes it must uncommit region A > 4. Uncommit thread uncommits bitmap slice for region A > 5. Segfault in Control Thread > > I do _believe_ if we had a per region lock, it would be useful here. Holding a lock over the entire heap for this feels like overkill. Or, we could schedule the uncommit so that it does not occur during a GC cycle. So, wait a sec. This code is in `ShenandoahResetBitmapTask`, so it can run in parallel. Putting a lock here inhibits parallelism. I understand the failure mode, but I think we should really be optimizing for the case when `ShenandoahUncommit` is not enabled (e.g. `-Xmx` == `-Xms`). Sounds like there is a hassle in allowing concurrent uncommit to overlap with the GC cycle. In addition to this particular problem, we might be stealing cycles from the GC threads and take additional TTSP lag to park the uncommitter for the in-cycle GC pauses. I have no clear solution for this yet, but I think we need to explore if we can suspend the uncommit before going into GC cycle... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1838568215 From lmesnik at openjdk.org Tue Nov 12 18:44:07 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 12 Nov 2024 18:44:07 GMT Subject: RFR: 8344051: Problemlist jdk/jfr/event/runtime/TestNativeMemoryUsageEvents.java with ZGC until JDK-8343893 is fixed Message-ID: See summary and main bug for description. ------------- Commit messages: - 8344051 Changes: https://git.openjdk.org/jdk/pull/22046/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22046&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344051 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22046.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22046/head:pull/22046 PR: https://git.openjdk.org/jdk/pull/22046 From wkemper at openjdk.org Tue Nov 12 19:02:37 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 12 Nov 2024 19:02:37 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v2] In-Reply-To: References: Message-ID: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Style and formatting fixes - Alphabetize includes in shenandoahHeap.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22019/files - new: https://git.openjdk.org/jdk/pull/22019/files/e6684365..7301871e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=00-01 Stats: 23 lines in 3 files changed: 10 ins; 6 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/22019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22019/head:pull/22019 PR: https://git.openjdk.org/jdk/pull/22019 From Monica.Beckwith at microsoft.com Tue Nov 12 19:11:58 2024 From: Monica.Beckwith at microsoft.com (Monica Beckwith) Date: Tue, 12 Nov 2024 19:11:58 +0000 Subject: Further discussion on Adaptable Heap Sizing with G1 In-Reply-To: References: <93ecaaaa-600e-4b91-a98b-f417f5505037@oracle.com> <7d07385e-bc04-4251-a5db-81ede0b90465@oracle.com> Message-ID: Hi everyone, Thank you all for the valuable and detailed discussion around AHS and heap management for G1. I wanted to share some thoughts that align with Thomas?s comments and clarify the best path forward, especially given the distinctions between AHS (Automatic Heap Sizing) and Google?s Adaptive Heap Sizing (AHS-Google). I?ve included simple diagrams to illustrate the technical flow and interactions of each approach. 1. Consolidating Around SoftMaxHeapSize for Dynamic, Adaptive Sizing Thomas?s suggestion to prioritize SoftMaxHeapSize as the main dynamic driver aligns with my understanding of an effective AHS model. Using SoftMaxHeapSize in this way allows us to minimize the CPU overhead associated with frequent uncommit/commit cycles, which would be a potential risk with a more rigid setting like ProposedHeapSize. Here?s a basic illustration of how Automatic Heap Sizing (AHS) with SoftMaxHeapSize would work dynamically: +-----------------------------+ | External Inputs | |-----------------------------| | - Global Memory Pressure | | - GCTimeRatio policy | | - Heap tunables via | | commandline | +-----------------------------+ | v +-----------------------------+ | Automatic Heap | | Sizing (AHS) | +-----------------------------+ | v +-----------------------------+ | SoftMaxHeapSize (Dynamic) | | - Guides heap size | | - Shrinks under pressure | | - Uses target heuristics | +-----------------------------+ | v +-----------------------------+ | JVM Heap Management | | - Adjusts committed memory | | - Controls expansions & | | contractions smoothly | +-----------------------------+ By consolidating around SoftMaxHeapSize as the primary ?target? flag, we create a more straightforward, adaptive, and consistent experience. 2. The AHS-Google Approach and Its Challenges Google?s current Adaptive Heap Sizing (AHS-Google) approach uses ProposedHeapSize as a fixed committed size target. While this allows for setting a specific target for memory use, it introduces some challenges, particularly with forced uncommit/commit cycles that might ignore dynamic inputs. Here?s how this approach typically functions: +-----------------------------+ | AHS-Google Logic | |-----------------------------| | - Periodic GC with target | | - Uses ProposedHeapSize as | | "optimal" committed size | +-----------------------------+ | v +-----------------------------+ | ProposedHeapSize (Fixed) | | - Forced committed memory | | - Overrides dynamic inputs | | - Can cause frequent | | uncommit/commit cycles | +-----------------------------+ | v +-----------------------------+ | JVM Heap Management | | - Follows set memory level | | - May ignore external | | pressure signals | +-----------------------------+ A purely AHS-based approach would allow SoftMaxHeapSize to adjust dynamically in response to real-time signals without forcing committed memory levels. This avoids unnecessary CPU cycles and provides a more adaptive response to environmental changes, such as fluctuating memory demands in containerized and cloud environments. 3. Key Differences Between AHS and AHS-Google In my understanding: ? AHS (Automatic Heap Sizing): Focuses on finding a reasonable heap size based on external memory pressure and dynamically adjusts according to environmental inputs. This aligns with Thomas?s point that AHS should allow for minimal user intervention and let dynamic factors guide heap behavior. ? AHS-Google: Treats ProposedHeapSize as a fixed input, overriding dynamic adjustments. While this gives more explicit control, it limits adaptability and could introduce inefficiencies, as mentioned earlier. 4. Moving Forward with a Balanced, Dynamic AHS for G1 Based on the discussion, I suggest we focus on developing an AHS model that leverages SoftMaxHeapSize as the adaptable target, allowing the JVM to adjust based on real-time memory pressures and CPU usage. Integrating multiple inputs dynamically will create a robust model for managing ?noisy neighbor? challenges?a very real need in today?s cloud and container scenarios and one that AHS is well-suited to manage, as highlighted in Erik?s recent JVMLS presentation. Thank you all again for the insightful conversation and technical contributions. I believe these steps will help us build a technically sound and stable AHS for G1. Please feel free to correct any misunderstandings or clarify any points where further alignment is needed. Regards, Monica [https://res-h3.public.cdn.office.net/assets/bookwithme/misc/CalendarPerson20px.png] Book time to meet with me From: hotspot-gc-dev On Behalf Of Jonathan Joo Sent: Thursday, October 17, 2024 7:11 PM To: Thomas Schatzl Cc: hotspot-gc-dev at openjdk.org Subject: Re: Further discussion on Adaptable Heap Sizing with G1 Hi Thomas, The points you mentioned make sense to me! There are some nuances that I'd like to dig into further to make sure that we are aligned. I think to summarize - I'm not sure exactly how SoftMaxHeapSize is intended to work, whereas we have experimented with ProposedHeapSize at Google already, so I want to bridge my gap in understanding there. I appreciate you offering to meet and discuss! As far as meeting time - I'm currently in US Pacific time, but flexible in terms of when we meet. (I am generally awake from 9am-1am PT, so I am good to meet any time in that time period -- please let me know what time works best for you.) Tuesday and Thursday of the coming week I have the most availability, but if you have any other dates/times in mind, I can let you know whether that works for me. Best, ~ Jonathan On Mon, Oct 14, 2024 at 2:52?AM Thomas Schatzl > wrote: Hi, On 11.10.24 09:16, Jonathan Joo wrote: > Hi Thomas, > > I think what this suggestion overlooks is that a SoftMaxHeapSize that > guides used heap size will automatically guide committed size: i.e. if > G1 shrinks the used heap, G1 will automatically shrink (and keep) the > committed size. > > So ProposedHeapSize seems to be very similar to SoftMaxHeapSize. > > > If I'm understanding this correctly - both ProposedHeapSize and (the > proposed version of) SoftMaxHeapSize have similar semantics, but > actually modify the heap in different ways. SoftMaxHeapSize helps us > determine when to start a concurrent mark, whereas ProposedHeapSize > doesn't actually trigger any GC directly, but affects the size of the > heap after a GC. Is that correct? Would it make sense then to have both > flags, where one helps set a trigger point for a GC, and one helps us > determine the heap size we are targeting after the GC? I might also be > missing some nuances here. I think SoftMaxHeapSize (or actually either) will result in approximately the same behavior. The difference is in intrusiveness. ProposedHeapSize forcefully attempts to decrease the committed heap size and then the rest of the "heap sizing machinery" follows, while SoftMaxHeapSize gives a target for the "heap sizing machinery" and committed heap size follows. ProposedHeapSize has the following disadvantages (as implemented): - since it forces committed heap size, I expect that in case you are close or above that target, you can get frequent uncommits/attempts to uncommit which waste cpu cycles. Hopefully, by giving the heap sizing machinery a goal, it will itself determine a sustainable committed memory level without too frequent commits and uncommits. - for some reason it does not allow less memory to be committed than proposed (but still larger than MinHeapSize). This can be inefficient wrt to memory usage. I.e. it basically disables other heap sizing afaict. - (that's more a nit) the use of "0" as special marker for SoftMaxHeapSize is unexpected. This mechanism kind of feels like a very blunt tool to get the desired effect (a certain committed heap) without caring about other goals. It may be necessary to pull out the immediately un/commit hammer in some situations, and imho, let's not give that hammer to users as the first option to screw themselves. > > I.e. if I understand this correctly: allowing a higher GC overhead, > automatically shrinks the heap. > > > Exactly - in practice, tuning this one parameter up (the target gc cpu > overhead) correlates with decreasing both the average as well as maximum > heap usage for a java program. > > I noticed the same with the patch attached to the SoftMaxHeapSize CR > (https://bugs.openjdk.org/browse/JDK-8236073 > >) discounting effects of > Min/MaxHeapFreeRatio (i.e. if you remove it, > https://bugs.openjdk.org/browse/JDK-8238686 > > explains the issue). > In practice, these two flags prohibit G1 from adjusting the heap unless > the SoftMaxHeapSize change is very large. > > > So I would prefer to only think of an alternative to SoftMaxHeapSize if > it has been shown that it does not work. > > > Given that you have a much stronger mental model than I do of how all > these flags fit together in the context of G1 GC, perhaps it would be > helpful to schedule some time to chat in person! I think that would help > clarify things much more quickly than email. To be clear - I have no > reason to doubt that SoftMaxHeapSize does not work. On the other hand, > could we possibly make use of both flags? For example, could > SoftMaxHeapSize potentially be a good replacement for our periodic GC? Not sure what periodic GC has to do with SoftMaxHeapSize. > > There is the nit that unlike in this implementation of ProposedHeapSize, > SoftMaxHeapSize will not cause uncommit below MinHeapSize. This is > another discussion on what to do about this issue - in a comment in > https://bugs.openjdk.org/browse/JDK-8236073 > > it is proposed to make > MinHeapSize manageable. > > > How useful is MinHeapSize in practice? Do we need it, or can we just set > it to zero to avoid having to deal with it at all? I think you are mixing AHS (give decent heap sizing in presence of external memory pressure) and getting "optimal" heap sizing (or iow "steering heap size" externally). AHS is mostly about the user not doing/setting any heap sizes; in this case just having min heap size very low is just fine just as suggested in the JEP. SoftMaxHeapSize (and ProposedHeapSize) is about the user setting a particular goal according to his whims. It is still interesting to set -Xms==-Xmx for e.g. fast startup or during heavy activity; however if an external system decides that it is useful to intermittently save memory up to a certain level, then follow that guidance. The mechanism to internally follow that guidance can be used by AHS. > > I (still) believe that AHS and SoftMaxHeapSize/ProposedHeapSize are > somewhat orthogonal. > > AHS (https://openjdk.org/jeps/8329758 > >) is about finding a reasonable > heap size, and adjust on external "pressure". SoftMax/ProposedHeapSize > are manual external tunings. > > > Wdyt? > > > I agree with the general idea - for us, we used a manual external flag > like ProposedHeapSize because we did not implement any of the AHS logic > in the JVM. (We had a separate AHS thread reading in container > information and then doing the calculations, then setting > ProposedHeapSize as a manageable flag.) The way I see it is that > SoftMax/ProposedHeapSize is the "output" of AHS, and then > SoftMax/ProposedHeapSize is the "input" for the JVM, after which the JVM > uses this input to adjust its behavior accordingly. Does that align with > how you see things? As mentioned in the other thread, SoftMaxHeapSize can be used by AHS to get heap to a certain level (based on memory pressure), but there is also that external entity that can modify SoftMaxHeapSize to adjust VM behavior. So ultimately there will be multiple inputs for target heap size (and probably I'm forgetting one or the other): * External memory pressure (AHS) (*) * CurrentMaxHeapSize * SoftMaxHeapSize * CPU usage (existing GCTimeRatio based policy) * other *HeapSize flags that need to be merged into some target heap level using some policy. After knowing that level, the VM needs to decide on a proper reaction, which might be anything from just setting internal IHOP goal, to (un-)committing memory directly, to doing the appropriate garbage collection in a "timely" fashion (which is where the regular periodic gc/marking comes in) or anything inbetween. (*) I am aware that the AHS JEP not only includes reaction on external memory pressure but also the merging of goals for different sources; some of them are ZGC specific. Some of them are already implemented in G1. So for this discussion it is imo useful to limit "AHS" in G1 context to things that G1 does not do. Ie. "return another goal based on external memory pressure", "min/max heap size defaults(!)", and "adjust adaptive sizing". > If we do indeed implement AHS logic fully within the JVM, then we could > internally manage the sizing of the heap without exposing a manageable > flag. That being said, it seems to me that exposing this as a manageable > flag brings the additional benefit that one could plug in their own AHS > implementation that calculates target heap sizes with whatever data they > want (and then passes it into the JVM via the manageable flag). > > Again, I wonder if meeting to discuss would be efficient, and then we > can update the mailing list with the results of our discussion. Let me > know your thoughts! It's fine with me to meet to recap and discuss above; please suggest some time. Hth, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkemper at openjdk.org Tue Nov 12 19:27:18 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 12 Nov 2024 19:27:18 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v2] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 18:17:29 GMT, Aleksey Shipilev wrote: >> Hmm, I'm not sure we can do that. Prior to this change, the control thread performed both clearing the bitmap and uncommitting the region's bitmap, so they could never happen concurrently. With this change, a separate thread could perform the uncommit. Consider: >> >> 1. Control thread takes heap lock, observes that bitmap slice for region A is committed >> 2. Control thread releases heap lock, begins clearing bitmap (writing zeros to bitmap slice) >> 3. Uncommit thread takes heap lock, believes it must uncommit region A >> 4. Uncommit thread uncommits bitmap slice for region A >> 5. Segfault in Control Thread >> >> I do _believe_ if we had a per region lock, it would be useful here. Holding a lock over the entire heap for this feels like overkill. Or, we could schedule the uncommit so that it does not occur during a GC cycle. > > So, wait a sec. This code is in `ShenandoahResetBitmapTask`, so it can run in parallel. Putting a lock here inhibits parallelism. I understand the failure mode, but I think we should really be optimizing for the case when `ShenandoahUncommit` is not enabled (e.g. `-Xmx` == `-Xms`). > > Sounds like there is a hassle in allowing concurrent uncommit to overlap with the GC cycle. In addition to this particular problem, we might be stealing cycles from the GC threads and take additional TTSP lag to park the uncommitter for the in-cycle GC pauses. I have no clear solution for this yet, but I think we need to explore if we can suspend the uncommit before going into GC cycle... We could have the control and uncommit threads coordinate their efforts. In the worst case, it could mean delaying concurrent reset while the control thread waits for the uncommit thread to yield. We could also try a more targeted lock only for the region's bitmap slice, but it doesn't seem right that one thread would be trying to clear a bitmap, while the other is trying to uncommit it. A lock could preserve technical correctness, but contention here would just mean that one thread would have wasted its time (either clearing a bitmap that is then uncommitted, or attempting to clear a bitmap that was first uncommitted (in this case, we would need the control thread to detect this and skip the region)). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1838641673 From wkemper at openjdk.org Tue Nov 12 19:27:19 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 12 Nov 2024 19:27:19 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v2] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 17:46:26 GMT, Aleksey Shipilev wrote: >> William Kemper has updated the pull request incrementally with two additional commits since the last revision: >> >> - Style and formatting fixes >> - Alphabetize includes in shenandoahHeap.cpp > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1932: > >> 1930: if (_uncommit_thread != nullptr) { >> 1931: _uncommit_thread->stop(); >> 1932: } > > Are there limits on proper sequencing here? Can we shutdown uncommit thread before cancelling the GC and waiting for control thread to exit? This would save end-to-end time for short commands, as we would hide the uncommit thread shutdown in the shadow of control thread exiting. I'm not sure the order matters here. `ConcurrentGCThread::stop` waits until the target thread sets `_has_terminated`. > src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.cpp line 139: > >> 137: _heap->notify_heap_changed(); >> 138: double elapsed = os::elapsedTime() - start; >> 139: log_info(gc)("Uncommitted " SIZE_FORMAT " regions, in %.3fs", count, elapsed); > > If we can, can we match the current log format? E.g. print `Concurrent uncommit`, with appropriately formatted timestamp? I think we also want `log_info(gc,start)` at the beginning of the method. I think `ShenandoahConcurrentPhase` helper did all that, can we still use it? We can restore the log messages, but I don't think `ShenandoahConcurrentPhase` and friends will like being used outside of a cycle. I'll look into it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1838644895 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1838646433 From rkennke at openjdk.org Tue Nov 12 19:35:50 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 12 Nov 2024 19:35:50 GMT Subject: RFR: 8344051: Problemlist jdk/jfr/event/runtime/TestNativeMemoryUsageEvents.java with ZGC until JDK-8343893 is fixed In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 18:29:04 GMT, Leonid Mesnik wrote: > See summary and main bug for description. Looks good, thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22046#pullrequestreview-2430512325 From zgu at openjdk.org Wed Nov 13 00:01:50 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 13 Nov 2024 00:01:50 GMT Subject: RFR: 8343508: Parallel: Use ordinary klass accessor in verify_filler_in_dense_prefix In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 06:16:15 GMT, Albert Mingkun Yang wrote: > One line change to use the common API to make the caller logic less obtrusive. > > Test: tier1-3 LGTM ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21866#pullrequestreview-2431116484 From lmesnik at openjdk.org Wed Nov 13 00:39:27 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 13 Nov 2024 00:39:27 GMT Subject: RFR: 8343953: Test jdk/jfr/threading/TestDeepVirtualStackTrace.java fails with Parallel/Serial GC Message-ID: Test fails because it doesn't always trigger jdk.ObjectAllocationOutsideTLAB event. Test tries to trigger jdk.ObjectAllocationOutsideTLAB by allocating new Object[10_000_000]; array. However, the TLAB is not limited for Parallel/Serial/Z GCs. So VM might just increase TLAB and allocate the array in new TLAB. The fix limit young generation size to ensure that TLAB of expected size can't be created and jdk.ObjectAllocationOutsideTLAB event is always generated. Verified by running 100 times with Parallel/Serial/ZGC on different platforms. Using jdk.ObjectAllocationOutsideTLAB is not the signifcant for the test. The better fix would be trigger some other event with 100% guarantee assuming that this event is not triggered outside of virtual thread. But not sure I which event is the good candidate. ------------- Commit messages: - Test jdk/jfr/threading/TestDeepVirtualStackTrace.java fails with Parallel/Serial GC Changes: https://git.openjdk.org/jdk/pull/22052/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22052&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343953 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22052.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22052/head:pull/22052 PR: https://git.openjdk.org/jdk/pull/22052 From ayang at openjdk.org Wed Nov 13 08:36:00 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 13 Nov 2024 08:36:00 GMT Subject: RFR: 8343508: Parallel: Use ordinary klass accessor in verify_filler_in_dense_prefix In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 06:16:15 GMT, Albert Mingkun Yang wrote: > One line change to use the common API to make the caller logic less obtrusive. > > Test: tier1-3 Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21866#issuecomment-2472829126 From ayang at openjdk.org Wed Nov 13 08:36:01 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 13 Nov 2024 08:36:01 GMT Subject: Integrated: 8343508: Parallel: Use ordinary klass accessor in verify_filler_in_dense_prefix In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 06:16:15 GMT, Albert Mingkun Yang wrote: > One line change to use the common API to make the caller logic less obtrusive. > > Test: tier1-3 This pull request has now been integrated. Changeset: e9ede464 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/e9ede464b2be84af676dc64bd3595b304bfe818d Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8343508: Parallel: Use ordinary klass accessor in verify_filler_in_dense_prefix Reviewed-by: tschatzl, zgu ------------- PR: https://git.openjdk.org/jdk/pull/21866 From mli at openjdk.org Wed Nov 13 09:33:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 13 Nov 2024 09:33:01 GMT Subject: RFR: 8343953: Test jdk/jfr/threading/TestDeepVirtualStackTrace.java fails with Parallel/Serial GC In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 00:33:49 GMT, Leonid Mesnik wrote: > Test fails because it doesn't always trigger jdk.ObjectAllocationOutsideTLAB event. > > Test tries to trigger jdk.ObjectAllocationOutsideTLAB by allocating > new Object[10_000_000]; > array. > > However, the TLAB is not limited for Parallel/Serial/Z GCs. So VM might just increase TLAB and allocate the array in new TLAB. The fix limit young generation size to ensure that TLAB of expected size can't be created and > jdk.ObjectAllocationOutsideTLAB event is always generated. > Verified by running 100 times with Parallel/Serial/ZGC on different platforms. > > Using jdk.ObjectAllocationOutsideTLAB is not the signifcant for the test. The better fix would be trigger some other event with 100% guarantee assuming that this event is not triggered outside of virtual thread. But not sure I which event is the good candidate. I think the fix should be fine. Just some minor comments. I know this is not related to your fix: just a question about `testNativeEvent`, should the `stackMethod` parameter be "deepsleep" rather than "sleep"? test/jdk/jdk/jfr/threading/TestDeepVirtualStackTrace.java line 45: > 43: * @library /test/lib /test/jdk > 44: * @modules jdk.jfr/jdk.jfr.internal > 45: * @run main/othervm -XX:MaxNewSize=40M -XX:FlightRecorderOptions:stackdepth=2048 It would be be good to add some comment about why this extra option is necessary. ------------- PR Review: https://git.openjdk.org/jdk/pull/22052#pullrequestreview-2432419165 PR Comment: https://git.openjdk.org/jdk/pull/22052#issuecomment-2472971330 PR Review Comment: https://git.openjdk.org/jdk/pull/22052#discussion_r1839823953 From mli at openjdk.org Wed Nov 13 09:34:00 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 13 Nov 2024 09:34:00 GMT Subject: RFR: 8344051: Problemlist jdk/jfr/event/runtime/TestNativeMemoryUsageEvents.java with ZGC until JDK-8343893 is fixed In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 18:29:04 GMT, Leonid Mesnik wrote: > See summary and main bug for description. Looks good. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22046#pullrequestreview-2432434290 From iwalulya at openjdk.org Wed Nov 13 09:59:05 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 13 Nov 2024 09:59:05 GMT Subject: RFR: 8297692: Avoid sending per-region GCPhaseParallel JFR events in G1ScanCollectionSetRegionClosure In-Reply-To: <0vG-VYZ2aKoCjTB6bxD8aTbqfB7LotSmJBL1LHrcLw8=.5cb24f1f-09aa-44e3-81e0-90badc70ee10@github.com> References: <0vG-VYZ2aKoCjTB6bxD8aTbqfB7LotSmJBL1LHrcLw8=.5cb24f1f-09aa-44e3-81e0-90badc70ee10@github.com> Message-ID: On Fri, 8 Nov 2024 15:20:21 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that significantly reduces the amount of "Code Roots" and "Optional Roots" JFR events to reduce default recording sizes significantly. > > E.g. a 10min BigRamTester run creates a 23MB recording without this change, with like hundreds of thousands of these events (#gcs * #gc threads * #regions in collection set). With this change, the recording is reduced to 4MB (#gcs * #gc threads) > > Testing: gha, tier1-3 > > Thanks, > Thomas LGTM! ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21984#pullrequestreview-2432499396 From ayang at openjdk.org Wed Nov 13 10:23:39 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 13 Nov 2024 10:23:39 GMT Subject: RFR: 8297692: Avoid sending per-region GCPhaseParallel JFR events in G1ScanCollectionSetRegionClosure In-Reply-To: <0vG-VYZ2aKoCjTB6bxD8aTbqfB7LotSmJBL1LHrcLw8=.5cb24f1f-09aa-44e3-81e0-90badc70ee10@github.com> References: <0vG-VYZ2aKoCjTB6bxD8aTbqfB7LotSmJBL1LHrcLw8=.5cb24f1f-09aa-44e3-81e0-90badc70ee10@github.com> Message-ID: On Fri, 8 Nov 2024 15:20:21 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that significantly reduces the amount of "Code Roots" and "Optional Roots" JFR events to reduce default recording sizes significantly. > > E.g. a 10min BigRamTester run creates a 23MB recording without this change, with like hundreds of thousands of these events (#gcs * #gc threads * #regions in collection set). With this change, the recording is reduced to 4MB (#gcs * #gc threads) > > Testing: gha, tier1-3 > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21984#pullrequestreview-2432568295 From jsikstro at openjdk.org Wed Nov 13 13:42:48 2024 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 13 Nov 2024 13:42:48 GMT Subject: RFR: 8343460: ZGC: Crash in ZRemembered::scan_page_and_clear_remset [v2] In-Reply-To: References: Message-ID: <9sJkaTUnAw4RUDd2i98viIeVS4C68xPZivMuzDsQu3E=.9b9b30e0-db5b-4ed0-ae52-a93476286cbb@github.com> On Tue, 12 Nov 2024 12:57:27 GMT, Axel Boldt-Christmas wrote: >> `free_page` may concurrently delete the remset while `scan_page_and_clear_remset` is scanning the page. Move it to after the `_safe_recycle.register_and_clone_if_activated`. Doing the deletion on the new cloned page will not occur as it not old. And the registered page's remset will be deleted by the destructor when the `_safe_recycle` scope quest up the `safe_destroy`. >> >> To be able to push the deletion all the way into `prepare_to_recycle` the unnecessary use of this mechanism had to be removed. `free_pages_alloc_failed` does not need to protect the pages, as they are not yet present in the PageTable. We have simply taken them out of the cache, but failed to commit or map some memory, so we are putting these pages back into the cache. See bed9c260bbc9bd208b03d7eedd4e2cfa151b58f2 >> >> The fix works without this last commit. So we must be careful to check that these pages cannot be reached by some other means. The FoundOld bitmap iteration goes through the PageTable so even if an old page was registered, we would not find these pages. >> >> There is a scary lack of a fence between the removal of the page from the PageTable and the lock in `register_and_clone_if_activated`. >> >> The stress test will deterministically crash with this modified code 0756e0056b44ee16bee81256f556c8df981ceaf9 and using these options `-XX:+UseZGC -XX:+UseNewCode -XX:ZCollectionIntervalMinor=0.1 -XX:ZCollectionIntervalMajor=1 -XX:ZFragmentationLimit=0 -XX:-CreateCoredumpOnCrash`, and no longer does after with this patch. > > Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: > > - Add comment about prepare_to_recycle > - Revert recycle_page call, still update last_used Marked as reviewed by jsikstro (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21905#pullrequestreview-2433200832 From tschatzl at openjdk.org Wed Nov 13 14:48:54 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 13 Nov 2024 14:48:54 GMT Subject: RFR: 8344051: Problemlist jdk/jfr/event/runtime/TestNativeMemoryUsageEvents.java with ZGC until JDK-8343893 is fixed In-Reply-To: References: Message-ID: <297iLMI0jXorsMAQr_QccKDFZfyWnsU30E6AcE-9de8=.e7d9da4c-8658-4d09-aae6-d24249c8b2fd@github.com> On Tue, 12 Nov 2024 18:29:04 GMT, Leonid Mesnik wrote: > See summary and main bug for description. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22046#pullrequestreview-2433471769 From lmesnik at openjdk.org Wed Nov 13 16:10:10 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 13 Nov 2024 16:10:10 GMT Subject: Integrated: 8344051: Problemlist jdk/jfr/event/runtime/TestNativeMemoryUsageEvents.java with ZGC until JDK-8343893 is fixed In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 18:29:04 GMT, Leonid Mesnik wrote: > See summary and main bug for description. This pull request has now been integrated. Changeset: eb240a7d Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/eb240a7df9a029bb762def86b805bdfdfa3e4625 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8344051: Problemlist jdk/jfr/event/runtime/TestNativeMemoryUsageEvents.java with ZGC until JDK-8343893 is fixed Reviewed-by: rkennke, mli, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/22046 From lmesnik at openjdk.org Wed Nov 13 17:12:30 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 13 Nov 2024 17:12:30 GMT Subject: RFR: 8343953: Test jdk/jfr/threading/TestDeepVirtualStackTrace.java fails with Parallel/Serial GC [v2] In-Reply-To: References: Message-ID: <38ad6jiKng-Z2R0UUAzWT_0Yl4ww31vpz0wmrDw7VIY=.6572e787-8f58-4620-a4d0-125cb16f5872@github.com> > Test fails because it doesn't always trigger jdk.ObjectAllocationOutsideTLAB event. > > Test tries to trigger jdk.ObjectAllocationOutsideTLAB by allocating > new Object[10_000_000]; > array. > > However, the TLAB is not limited for Parallel/Serial/Z GCs. So VM might just increase TLAB and allocate the array in new TLAB. The fix limit young generation size to ensure that TLAB of expected size can't be created and > jdk.ObjectAllocationOutsideTLAB event is always generated. > Verified by running 100 times with Parallel/Serial/ZGC on different platforms. > > Using jdk.ObjectAllocationOutsideTLAB is not the signifcant for the test. The better fix would be trigger some other event with 100% guarantee assuming that this event is not triggered outside of virtual thread. But not sure I which event is the good candidate. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: added comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22052/files - new: https://git.openjdk.org/jdk/pull/22052/files/ac0542d0..95562045 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22052&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22052&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22052.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22052/head:pull/22052 PR: https://git.openjdk.org/jdk/pull/22052 From mli at openjdk.org Wed Nov 13 18:55:11 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 13 Nov 2024 18:55:11 GMT Subject: RFR: 8343953: Test jdk/jfr/threading/TestDeepVirtualStackTrace.java fails with Parallel/Serial GC [v2] In-Reply-To: <38ad6jiKng-Z2R0UUAzWT_0Yl4ww31vpz0wmrDw7VIY=.6572e787-8f58-4620-a4d0-125cb16f5872@github.com> References: <38ad6jiKng-Z2R0UUAzWT_0Yl4ww31vpz0wmrDw7VIY=.6572e787-8f58-4620-a4d0-125cb16f5872@github.com> Message-ID: On Wed, 13 Nov 2024 17:12:30 GMT, Leonid Mesnik wrote: >> Test fails because it doesn't always trigger jdk.ObjectAllocationOutsideTLAB event. >> >> Test tries to trigger jdk.ObjectAllocationOutsideTLAB by allocating >> new Object[10_000_000]; >> array. >> >> However, the TLAB is not limited for Parallel/Serial/Z GCs. So VM might just increase TLAB and allocate the array in new TLAB. The fix limit young generation size to ensure that TLAB of expected size can't be created and >> jdk.ObjectAllocationOutsideTLAB event is always generated. >> Verified by running 100 times with Parallel/Serial/ZGC on different platforms. >> >> Using jdk.ObjectAllocationOutsideTLAB is not the signifcant for the test. The better fix would be trigger some other event with 100% guarantee assuming that this event is not triggered outside of virtual thread. But not sure I which event is the good candidate. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > added comment Looks good, Thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22052#pullrequestreview-2434158372 From duke at openjdk.org Wed Nov 13 23:36:32 2024 From: duke at openjdk.org (duke) Date: Wed, 13 Nov 2024 23:36:32 GMT Subject: Withdrawn: 8340381: Shenandoah: Class mirrors verification should check forwarded objects In-Reply-To: <9vV2xnuP2lgRCLLbB5LWnIg26HtPjS7BOIyt0qaLkwg=.d7975d49-c70b-43e5-89cb-ef1b4f86ac52@github.com> References: <9vV2xnuP2lgRCLLbB5LWnIg26HtPjS7BOIyt0qaLkwg=.d7975d49-c70b-43e5-89cb-ef1b4f86ac52@github.com> Message-ID: <9mAsbnJFW9K4S36cpIm1a4tKbrVHtBLRgsOIYipLQL4=.da564665-6894-465b-ac65-fe548c8807ed@github.com> On Wed, 18 Sep 2024 13:48:43 GMT, Aleksey Shipilev wrote: > The from-space objects can be effectively dead, and their backlinks to `InstanceKlass*` not updated anymore. So they can point to garbage. > > Additional testing: > - [x] Some previously failing reproducers are not failing anymore > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21064 From sangheki at openjdk.org Thu Nov 14 04:01:28 2024 From: sangheki at openjdk.org (Sangheon Kim) Date: Thu, 14 Nov 2024 04:01:28 GMT Subject: RFR: 8297692: Avoid sending per-region GCPhaseParallel JFR events in G1ScanCollectionSetRegionClosure In-Reply-To: <0vG-VYZ2aKoCjTB6bxD8aTbqfB7LotSmJBL1LHrcLw8=.5cb24f1f-09aa-44e3-81e0-90badc70ee10@github.com> References: <0vG-VYZ2aKoCjTB6bxD8aTbqfB7LotSmJBL1LHrcLw8=.5cb24f1f-09aa-44e3-81e0-90badc70ee10@github.com> Message-ID: <5_fWESl-K5zEdjGPOXjPtCDERZlI3auEG6BG8oYJ6rs=.7a987413-2d89-44e8-a14e-ae7deb0ab3c6@github.com> On Fri, 8 Nov 2024 15:20:21 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that significantly reduces the amount of "Code Roots" and "Optional Roots" JFR events to reduce default recording sizes significantly. > > E.g. a 10min BigRamTester run creates a 23MB recording without this change, with like hundreds of thousands of these events (#gcs * #gc threads * #regions in collection set). With this change, the recording is reduced to 4MB (#gcs * #gc threads) > > Testing: gha, tier1-3 > > Thanks, > Thomas LGTM ------------- Marked as reviewed by sangheki (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21984#pullrequestreview-2434992106 From prr at openjdk.org Thu Nov 14 04:37:13 2024 From: prr at openjdk.org (Phil Race) Date: Thu, 14 Nov 2024 04:37:13 GMT Subject: RFR: 8343490: Update copyright year for JDK-8341692 In-Reply-To: <2BwWuKdm5FwggsXPwo3P2xRD6CGr5QDdn3gVG5x5fo0=.41d944e6-6737-4d7d-8654-986149b41c9d@github.com> References: <2BwWuKdm5FwggsXPwo3P2xRD6CGr5QDdn3gVG5x5fo0=.41d944e6-6737-4d7d-8654-986149b41c9d@github.com> Message-ID: <9R5gF8dIqwO0TbBhICMDC7c2g4gvY0tXA82LcvmAEpI=.4159e63b-4899-48d9-9829-0f13c6307446@github.com> On Tue, 5 Nov 2024 01:41:00 GMT, SendaoYan wrote: > Hi all, > The copyright year of some files which has been changed by [JDK-8341692](https://bugs.openjdk.org/browse/JDK-8341692) wasn't update correctly. This PR update the copyright year of [JDK-8341692](https://bugs.openjdk.org/browse/JDK-8341692). Trivial fix, no risk. FWIW this whole PR seems like a waste of a bug id. Copyright year is implied anyway. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21891#issuecomment-2475394959 From aboldtch at openjdk.org Thu Nov 14 06:15:57 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 14 Nov 2024 06:15:57 GMT Subject: RFR: 8343460: ZGC: Crash in ZRemembered::scan_page_and_clear_remset [v2] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 12:57:27 GMT, Axel Boldt-Christmas wrote: >> `free_page` may concurrently delete the remset while `scan_page_and_clear_remset` is scanning the page. Move it to after the `_safe_recycle.register_and_clone_if_activated`. Doing the deletion on the new cloned page will not occur as it not old. And the registered page's remset will be deleted by the destructor when the `_safe_recycle` scope quest up the `safe_destroy`. >> >> To be able to push the deletion all the way into `prepare_to_recycle` the unnecessary use of this mechanism had to be removed. `free_pages_alloc_failed` does not need to protect the pages, as they are not yet present in the PageTable. We have simply taken them out of the cache, but failed to commit or map some memory, so we are putting these pages back into the cache. See bed9c260bbc9bd208b03d7eedd4e2cfa151b58f2 >> >> The fix works without this last commit. So we must be careful to check that these pages cannot be reached by some other means. The FoundOld bitmap iteration goes through the PageTable so even if an old page was registered, we would not find these pages. >> >> There is a scary lack of a fence between the removal of the page from the PageTable and the lock in `register_and_clone_if_activated`. >> >> The stress test will deterministically crash with this modified code 0756e0056b44ee16bee81256f556c8df981ceaf9 and using these options `-XX:+UseZGC -XX:+UseNewCode -XX:ZCollectionIntervalMinor=0.1 -XX:ZCollectionIntervalMajor=1 -XX:ZFragmentationLimit=0 -XX:-CreateCoredumpOnCrash`, and no longer does after with this patch. > > Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: > > - Add comment about prepare_to_recycle > - Revert recycle_page call, still update last_used Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21905#issuecomment-2475498546 From aboldtch at openjdk.org Thu Nov 14 06:15:58 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 14 Nov 2024 06:15:58 GMT Subject: Integrated: 8343460: ZGC: Crash in ZRemembered::scan_page_and_clear_remset In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 14:10:47 GMT, Axel Boldt-Christmas wrote: > `free_page` may concurrently delete the remset while `scan_page_and_clear_remset` is scanning the page. Move it to after the `_safe_recycle.register_and_clone_if_activated`. Doing the deletion on the new cloned page will not occur as it not old. And the registered page's remset will be deleted by the destructor when the `_safe_recycle` scope quest up the `safe_destroy`. > > To be able to push the deletion all the way into `prepare_to_recycle` the unnecessary use of this mechanism had to be removed. `free_pages_alloc_failed` does not need to protect the pages, as they are not yet present in the PageTable. We have simply taken them out of the cache, but failed to commit or map some memory, so we are putting these pages back into the cache. See bed9c260bbc9bd208b03d7eedd4e2cfa151b58f2 > > The fix works without this last commit. So we must be careful to check that these pages cannot be reached by some other means. The FoundOld bitmap iteration goes through the PageTable so even if an old page was registered, we would not find these pages. > > There is a scary lack of a fence between the removal of the page from the PageTable and the lock in `register_and_clone_if_activated`. > > The stress test will deterministically crash with this modified code 0756e0056b44ee16bee81256f556c8df981ceaf9 and using these options `-XX:+UseZGC -XX:+UseNewCode -XX:ZCollectionIntervalMinor=0.1 -XX:ZCollectionIntervalMajor=1 -XX:ZFragmentationLimit=0 -XX:-CreateCoredumpOnCrash`, and no longer does after with this patch. This pull request has now been integrated. Changeset: e7d90b94 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/e7d90b941fff095f4b1555020c09270d201c7402 Stats: 25 lines in 2 files changed: 5 ins; 16 del; 4 mod 8343460: ZGC: Crash in ZRemembered::scan_page_and_clear_remset Reviewed-by: jsikstro, eosterlund, stefank ------------- PR: https://git.openjdk.org/jdk/pull/21905 From syan at openjdk.org Thu Nov 14 06:16:22 2024 From: syan at openjdk.org (SendaoYan) Date: Thu, 14 Nov 2024 06:16:22 GMT Subject: Withdrawn: 8343490: Update copyright year for JDK-8341692 In-Reply-To: <2BwWuKdm5FwggsXPwo3P2xRD6CGr5QDdn3gVG5x5fo0=.41d944e6-6737-4d7d-8654-986149b41c9d@github.com> References: <2BwWuKdm5FwggsXPwo3P2xRD6CGr5QDdn3gVG5x5fo0=.41d944e6-6737-4d7d-8654-986149b41c9d@github.com> Message-ID: On Tue, 5 Nov 2024 01:41:00 GMT, SendaoYan wrote: > Hi all, > The copyright year of some files which has been changed by [JDK-8341692](https://bugs.openjdk.org/browse/JDK-8341692) wasn't update correctly. This PR update the copyright year of [JDK-8341692](https://bugs.openjdk.org/browse/JDK-8341692). Trivial fix, no risk. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21891 From thomas.schatzl at oracle.com Thu Nov 14 13:29:23 2024 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 14 Nov 2024 14:29:23 +0100 Subject: Further discussion on Adaptable Heap Sizing with G1 In-Reply-To: References: <93ecaaaa-600e-4b91-a98b-f417f5505037@oracle.com> <7d07385e-bc04-4251-a5db-81ede0b90465@oracle.com> Message-ID: <6cf0d157-e40d-4df9-a91e-c9c79afb1ac2@oracle.com> Hi Monica, as far as I understand the suggested approach is good. Just two comments: what Google provided with ProposedHeapSize has only been part of the solution, so it obviously falls a bit short of what AHS intends to do, but that was intended I think. One might also see SoftMaxHeapSize (the flag) as a knob for the user to set what he thinks is good, that is somehow merged with an internal SoftMaxHeapSize. I hope this is not too confusing, if it is, then we can ignore this comment for now. Thanks for driving this forward, Thomas On 12.11.24 20:11, Monica Beckwith wrote: > Hi everyone, > Thank you all for the valuable and detailed discussion around AHS and > heap management for G1. I wanted to share some thoughts that align with > Thomas?s comments and clarify the best path forward, especially given > the distinctions between AHS (Automatic Heap Sizing) and Google?s > Adaptive Heap Sizing (AHS-Google). I?ve included simple diagrams to > illustrate the technical flow and interactions of each approach. > *1. Consolidating Around SoftMaxHeapSize for Dynamic, Adaptive Sizing* > Thomas?s suggestion to prioritize SoftMaxHeapSize as the main dynamic > driver aligns with my understanding of an effective AHS model. Using > SoftMaxHeapSize in this way allows us to minimize the CPU overhead > associated with frequent uncommit/commit cycles, which would be a > potential risk with a more rigid setting like ProposedHeapSize. Here?s a > basic illustration of how Automatic Heap Sizing (AHS) with > SoftMaxHeapSize would work dynamically: > ?? +-----------------------------+ > ?? |??????? External Inputs????? | > ?? |-----------------------------| > ?? | - Global Memory Pressure??? | > ?? | - GCTimeRatio policy??????? | > ?? | - Heap tunables via???????? | > ?? |?? commandline?????????????? | > ?? +-----------------------------+ > ?????????????? | > ?????????????? v > ?? +-----------------------------+ > ?? |?????? Automatic Heap??????? | > ?? |????????? Sizing (AHS)?????? | > ?? +-----------------------------+ > ?????????????? | > ?????????????? v > ?? +-----------------------------+ > ?? |? SoftMaxHeapSize (Dynamic)? | > ?? | - Guides heap size????????? | > ?? | - Shrinks under pressure??? | > ?? | - Uses target heuristics??? | > ?? +-----------------------------+ > ?????????????? | > ?????????????? v > ?? +-----------------------------+ > ?? |???? JVM Heap Management???? | > ?? | - Adjusts committed memory? | > ?? | - Controls expansions &???? | > ?? |?? contractions smoothly???? | > ?? +-----------------------------+ > By consolidating around SoftMaxHeapSize as the primary ?target? flag, we > create a more straightforward, adaptive, and consistent experience. > *2. The AHS-Google Approach and Its Challenges* > Google?s current Adaptive Heap Sizing (AHS-Google) approach uses > ProposedHeapSize as a fixed committed size target. While this allows for > setting a specific target for memory use, it introduces some challenges, > particularly with forced uncommit/commit cycles that might ignore > dynamic inputs. Here?s how this approach typically functions: > ?? +-----------------------------+ > ?? |??????? AHS-Google Logic???? | > ?? |-----------------------------| > ?? | - Periodic GC with target?? | > ?? | - Uses ProposedHeapSize as? | > ?? |?? "optimal" committed size? | > ?? +-----------------------------+ > ?????????????? | > ?????????????? v > ?? +-----------------------------+ > ?? |??? ProposedHeapSize (Fixed) | > ?? | - Forced committed memory?? | > ?? | - Overrides dynamic inputs? | > ?? | - Can cause frequent??????? | > ?? |?? uncommit/commit cycles??? | > ?? +-----------------------------+ > ?????????????? | > ?????????????? v > ?? +-----------------------------+ > ?? |???? JVM Heap Management???? | > ?? | - Follows set memory level? | > ?? | - May ignore external?????? | > ?? |?? pressure signals????????? | > ?? +-----------------------------+ > A purely AHS-based approach would allow SoftMaxHeapSize to adjust > dynamically in response to real-time signals without forcing committed > memory levels. This avoids unnecessary CPU cycles and provides a more > adaptive response to environmental changes, such as fluctuating memory > demands in containerized and cloud environments. > *3. Key Differences Between AHS and AHS-Google* > In my understanding: > > * *AHS (Automatic Heap Sizing)*: Focuses on finding a reasonable heap > size based on external memory pressure and dynamically adjusts > according to environmental inputs. This aligns with Thomas?s point > that AHS should allow for minimal user intervention and let dynamic > factors guide heap behavior. > * *AHS-Google*: Treats ProposedHeapSize as a fixed input, overriding > dynamic adjustments. While this gives more explicit control, it > limits adaptability and could introduce inefficiencies, as mentioned > earlier. > > *4. Moving Forward with a Balanced, Dynamic AHS for G1* > Based on the discussion, I suggest we focus on developing an AHS model > that leverages SoftMaxHeapSize as the adaptable target, allowing the JVM > to adjust based on real-time memory pressures and CPU usage. Integrating > multiple inputs dynamically will create a robust model for managing > ?noisy neighbor? challenges?a very real need in today?s cloud and > container scenarios and one that AHS is well-suited to manage, as > highlighted in Erik?s recent JVMLS presentation. > Thank you all again for the insightful conversation and technical > contributions. I believe these steps will help us build a technically > sound and stable AHS for G1. > Please feel free to correct any misunderstandings or clarify any points > where further alignment is needed. > Regards, > Monica > Book time to meet with me > > *From:* hotspot-gc-dev *On Behalf Of > *Jonathan Joo > *Sent:* Thursday, October 17, 2024 7:11 PM > *To:* Thomas Schatzl > *Cc:* hotspot-gc-dev at openjdk.org > *Subject:* Re: Further discussion on Adaptable Heap Sizing with G1 > Hi Thomas, > The points you mentioned make sense to me! There are some nuances that > I'd like to dig into further to make sure that?we are aligned. I think > to summarize - I'm not sure exactly how SoftMaxHeapSize is intended to > work, whereas we have experimented with ProposedHeapSize at Google > already, so I want to bridge my gap in understanding there. > I appreciate you offering to meet and discuss! As far as meeting time - > I'm currently in US Pacific time, but flexible in terms of when we meet. > (I am generally awake from 9am-1am PT, so I am good to meet any time in > that time period -- please let me know what time works best for you.) > Tuesday and Thursday of the coming week I have the most availability, > but if you have any other dates/times in mind, I can let you know > whether that works for me. > Best, > ~ Jonathan > On Mon, Oct 14, 2024 at 2:52?AM Thomas Schatzl > <_thomas.schatzl at oracle.com_ > wrote: > Hi, > > On 11.10.24 09:16, Jonathan Joo wrote: >> Hi Thomas, >> >>? ? ?I think what this suggestion overlooks is that a SoftMaxHeapSize that >>? ? ?guides used heap size will automatically guide committed size: i.e. if >>? ? ?G1 shrinks the used heap, G1 will automatically shrink (and keep) the >>? ? ?committed size. >> >>? ? ?So ProposedHeapSize seems to be very similar to SoftMaxHeapSize. >> >> >> If I'm understanding this correctly - both ProposedHeapSize and (the >> proposed version of) SoftMaxHeapSize have similar semantics, but >> actually modify the heap in different ways. SoftMaxHeapSize helps us >> determine when to start a concurrent mark, whereas ProposedHeapSize >> doesn't actually trigger any GC directly, but affects the size of the >> heap after a GC. Is that correct? Would it make sense then to have both >> flags, where one helps set a trigger point for a GC, and one helps us >> determine the heap size we are targeting after the GC? I might also be >> missing some nuances here. > > I think SoftMaxHeapSize (or actually either) will result in > approximately the same behavior. The difference is in intrusiveness. > > ProposedHeapSize forcefully attempts to decrease the committed heap size > and then the rest of the "heap sizing machinery" follows, while > SoftMaxHeapSize gives a target for the "heap sizing machinery" and > committed heap size follows. > > ProposedHeapSize has the following disadvantages (as implemented): > > - since it forces committed heap size, I expect that in case you are > close or above that target, you can get frequent uncommits/attempts to > uncommit which waste cpu cycles. > > Hopefully, by giving the heap sizing machinery a goal, it will itself > determine a sustainable committed memory level without too frequent > commits and uncommits. > > - for some reason it does not allow less memory to be committed than > proposed (but still larger than MinHeapSize). This can be inefficient > wrt to memory usage. > I.e. it basically disables other heap sizing afaict. > > - (that's more a nit) the use of "0" as special marker for > SoftMaxHeapSize is unexpected. > > This mechanism kind of feels like a very blunt tool to get the desired > effect (a certain committed heap) without caring about other goals. It > may be necessary to pull out the immediately un/commit hammer in some > situations, and imho, let's not give that hammer to users as the first > option to screw themselves. > >> >>? ? ? ?I.e. if I understand this correctly: allowing a higher GC overhead, >>? ? ?automatically shrinks the heap. >> >> >> Exactly - in practice, tuning this one parameter up (the target gc cpu >> overhead) correlates with decreasing both the average as well as maximum >> heap usage for a java program. >> >>? ? ? ?I noticed the same with the patch attached to the SoftMaxHeapSize CR >>? ? ?(_https://bugs.openjdk.org/browse/JDK-8236073_ > >>? ? ?<_https://bugs.openjdk.org/browse/JDK-8236073_ > >) discounting effects of >>? ? ?Min/MaxHeapFreeRatio (i.e. if you remove it, >> _https://bugs.openjdk.org/browse/JDK-8238686_ > >>? ? ?<_https://bugs.openjdk.org/browse/JDK-8238686_ > >?explains the issue). >>? ? ?In practice, these two flags prohibit G1 from adjusting the heap unless >>? ? ?the SoftMaxHeapSize change is very large. >> >> >>? ? ?So I would prefer to only think of an alternative to SoftMaxHeapSize if >>? ? ?it has been shown that it does not work. >> >> >> Given that you have a much stronger mental model than I do of how all >> these flags fit together in the context of G1 GC, perhaps it would be >> helpful to schedule some time to chat in person! I think that would help >> clarify things much more quickly than email. To be clear - I have no >> reason to doubt that SoftMaxHeapSize does not work. On the other hand, >> could we possibly make use of both flags? For example, could >> SoftMaxHeapSize potentially be a good replacement for our periodic GC? > > Not sure what periodic GC has to do with SoftMaxHeapSize. > >> >>? ? ?There is the nit that unlike in this implementation of ProposedHeapSize, >>? ? ?SoftMaxHeapSize will not cause uncommit below MinHeapSize. This is >>? ? ?another discussion on what to do about this issue - in a comment in >> _https://bugs.openjdk.org/browse/JDK-8236073_ > >>? ? ?<_https://bugs.openjdk.org/browse/JDK-8236073_ > >?it is proposed to make >>? ? ?MinHeapSize manageable. >> >> >> How useful is MinHeapSize in practice? Do we need it, or can we just set >> it to zero to avoid having to deal with it at all? > > I think you are mixing AHS (give decent heap sizing in presence of > external memory pressure) and getting "optimal" heap sizing (or iow > "steering heap size" externally). > > AHS is mostly about the user not doing/setting any heap sizes; in this > case just having min heap size very low is just fine just as suggested > in the JEP. > > SoftMaxHeapSize (and ProposedHeapSize) is about the user setting a > particular goal according to his whims. It is still interesting to set > -Xms==-Xmx for e.g. fast startup or during heavy activity; however if an > external system decides that it is useful to intermittently save memory > up to a certain level, then follow that guidance. > > The mechanism to internally follow that guidance can be used by AHS. > > >> >>? ? ?I (still) believe that AHS and SoftMaxHeapSize/ProposedHeapSize are >>? ? ?somewhat orthogonal. >> >>? ? ?AHS (_https://openjdk.org/jeps/8329758_ > >>? ? ?<_https://openjdk.org/jeps/8329758_ > >) is about finding a reasonable >>? ? ?heap size, and adjust on external "pressure". SoftMax/ProposedHeapSize >>? ? ?are manual external tunings. >> >> >>? ? ?Wdyt? >> >> >> I agree with the general idea - for us, we used a manual external flag >> like ProposedHeapSize because we did not implement any of the AHS logic >> in the JVM. (We had a separate AHS thread reading in container >> information and then doing the calculations, then setting >> ProposedHeapSize as a manageable flag.) The way I see it is that >> SoftMax/ProposedHeapSize is the "output" of AHS, and then >> SoftMax/ProposedHeapSize is the "input" for the JVM,?after which the JVM >> uses this input to adjust its behavior accordingly. Does that align with >> how you see things? > > As mentioned in the other thread, SoftMaxHeapSize can be used by AHS to > get heap to a certain level (based on memory pressure), but there is > also that external entity that can modify SoftMaxHeapSize to adjust VM > behavior. > > So ultimately there will be multiple inputs for target heap size (and > probably I'm forgetting one or the other): > > * External memory pressure (AHS) (*) > > * CurrentMaxHeapSize > > * SoftMaxHeapSize > > * CPU usage (existing GCTimeRatio based policy) > > * other *HeapSize flags > > that need to be merged into some target heap level using some policy. > > After knowing that level, the VM needs to decide on a proper reaction, > which might be anything from just setting internal IHOP goal, to > (un-)committing memory directly, to doing the appropriate garbage > collection in a "timely" fashion (which is where the regular periodic > gc/marking comes in) or anything inbetween. > > (*) I am aware that the AHS JEP not only includes reaction on external > memory pressure but also the merging of goals for different sources; > some of them are ZGC specific. Some of them are already implemented in > G1. So for this discussion it is imo useful to limit "AHS" in G1 context > to things that G1 does not do. Ie. "return another goal based on > external memory pressure", "min/max heap size defaults(!)", and "adjust > adaptive sizing". > >> If we do indeed implement AHS logic fully within the JVM, then we could >> internally manage the sizing of the heap without exposing a manageable >> flag. That being said, it seems to me that exposing this as a manageable >> flag brings the additional benefit that one could plug in their own AHS >> implementation that calculates target heap sizes with whatever data they >> want (and then passes it into the JVM via the manageable flag). >> >> Again, I wonder if meeting to discuss would be efficient, and then we >> can update the mailing list with the results of our discussion. Let me >> know your thoughts! > > It's fine with me to meet to recap and discuss above; please suggest > some time. > > Hth, > ? ?Thomas From tschatzl at openjdk.org Thu Nov 14 13:34:23 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 14 Nov 2024 13:34:23 GMT Subject: RFR: 8297692: Avoid sending per-region GCPhaseParallel JFR events in G1ScanCollectionSetRegionClosure In-Reply-To: <5_fWESl-K5zEdjGPOXjPtCDERZlI3auEG6BG8oYJ6rs=.7a987413-2d89-44e8-a14e-ae7deb0ab3c6@github.com> References: <0vG-VYZ2aKoCjTB6bxD8aTbqfB7LotSmJBL1LHrcLw8=.5cb24f1f-09aa-44e3-81e0-90badc70ee10@github.com> <5_fWESl-K5zEdjGPOXjPtCDERZlI3auEG6BG8oYJ6rs=.7a987413-2d89-44e8-a14e-ae7deb0ab3c6@github.com> Message-ID: On Thu, 14 Nov 2024 03:56:05 GMT, Sangheon Kim wrote: >> Hi all, >> >> please review this change that significantly reduces the amount of "Code Roots" and "Optional Roots" JFR events to reduce default recording sizes significantly. >> >> E.g. a 10min BigRamTester run creates a 23MB recording without this change, with like hundreds of thousands of these events (#gcs * #gc threads * #regions in collection set). With this change, the recording is reduced to 4MB (#gcs * #gc threads) >> >> Testing: gha, tier1-3 >> >> Thanks, >> Thomas > > LGTM Thanks @sangheon @walulyai @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/21984#issuecomment-2476365240 From tschatzl at openjdk.org Thu Nov 14 13:34:24 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 14 Nov 2024 13:34:24 GMT Subject: Integrated: 8297692: Avoid sending per-region GCPhaseParallel JFR events in G1ScanCollectionSetRegionClosure In-Reply-To: <0vG-VYZ2aKoCjTB6bxD8aTbqfB7LotSmJBL1LHrcLw8=.5cb24f1f-09aa-44e3-81e0-90badc70ee10@github.com> References: <0vG-VYZ2aKoCjTB6bxD8aTbqfB7LotSmJBL1LHrcLw8=.5cb24f1f-09aa-44e3-81e0-90badc70ee10@github.com> Message-ID: On Fri, 8 Nov 2024 15:20:21 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that significantly reduces the amount of "Code Roots" and "Optional Roots" JFR events to reduce default recording sizes significantly. > > E.g. a 10min BigRamTester run creates a 23MB recording without this change, with like hundreds of thousands of these events (#gcs * #gc threads * #regions in collection set). With this change, the recording is reduced to 4MB (#gcs * #gc threads) > > Testing: gha, tier1-3 > > Thanks, > Thomas This pull request has now been integrated. Changeset: a73226b1 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/a73226b18e274c44171021760e9eb05bc4a8b711 Stats: 162 lines in 3 files changed: 84 ins; 52 del; 26 mod 8297692: Avoid sending per-region GCPhaseParallel JFR events in G1ScanCollectionSetRegionClosure Reviewed-by: iwalulya, ayang, sangheki ------------- PR: https://git.openjdk.org/jdk/pull/21984 From lmesnik at openjdk.org Thu Nov 14 16:05:57 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 14 Nov 2024 16:05:57 GMT Subject: Integrated: 8343953: Test jdk/jfr/threading/TestDeepVirtualStackTrace.java fails with Parallel/Serial GC In-Reply-To: References: Message-ID: <63-049r97ibAU1u4pY_hk0h_XX63exDDCkjVR9svwGw=.2cce7d55-a7a5-4323-bb00-30f94855093f@github.com> On Wed, 13 Nov 2024 00:33:49 GMT, Leonid Mesnik wrote: > Test fails because it doesn't always trigger jdk.ObjectAllocationOutsideTLAB event. > > Test tries to trigger jdk.ObjectAllocationOutsideTLAB by allocating > new Object[10_000_000]; > array. > > However, the TLAB is not limited for Parallel/Serial/Z GCs. So VM might just increase TLAB and allocate the array in new TLAB. The fix limit young generation size to ensure that TLAB of expected size can't be created and > jdk.ObjectAllocationOutsideTLAB event is always generated. > Verified by running 100 times with Parallel/Serial/ZGC on different platforms. > > Using jdk.ObjectAllocationOutsideTLAB is not the signifcant for the test. The better fix would be trigger some other event with 100% guarantee assuming that this event is not triggered outside of virtual thread. But not sure I which event is the good candidate. This pull request has now been integrated. Changeset: 68164a48 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/68164a4847bc309a09701162528b4469660a58f0 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod 8343953: Test jdk/jfr/threading/TestDeepVirtualStackTrace.java fails with Parallel/Serial GC Reviewed-by: mli ------------- PR: https://git.openjdk.org/jdk/pull/22052 From stefank at openjdk.org Thu Nov 14 17:45:20 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 14 Nov 2024 17:45:20 GMT Subject: RFR: 8344071: Mark some jdk/jfr/event/oldobject test flagless until they fixed to support all GC In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 22:31:18 GMT, Leonid Mesnik wrote: > Tests > jdk/jfr/event/oldobject/TestObjectDescription.java > jdk/jfr/event/oldobject/TestClassLoaderLeak.java > fail with different GCs to get expected events. > > They pass with G1 GC so I don't want to problemlist them. > However, they should test all GCs that support these events. > > Removed ZGC, because I expect that is Generational and should be also supported. > > The flagless should be removed when the main bugs are fixed. Marked as reviewed by stefank (Reviewer). test/jdk/jdk/jfr/event/oldobject/TestObjectDescription.java line 44: > 42: * @key jfr > 43: * @requires vm.hasJFR > 44: * @requires vm.gc != "Z" & vm.gc != "Shenandoah" Why did you remove the exclusion of ZGC here and left Shenandoah? Isn't it better to leave the ZGC exclusion here? ------------- PR Review: https://git.openjdk.org/jdk/pull/22050#pullrequestreview-2436797285 PR Review Comment: https://git.openjdk.org/jdk/pull/22050#discussion_r1842648376 From lmesnik at openjdk.org Thu Nov 14 18:01:39 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 14 Nov 2024 18:01:39 GMT Subject: RFR: 8344071: Mark some jdk/jfr/event/oldobject test flagless until they fixed to support all GC In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 17:41:38 GMT, Stefan Karlsson wrote: >> Tests >> jdk/jfr/event/oldobject/TestObjectDescription.java >> jdk/jfr/event/oldobject/TestClassLoaderLeak.java >> fail with different GCs to get expected events. >> >> They pass with G1 GC so I don't want to problemlist them. >> However, they should test all GCs that support these events. >> >> Removed ZGC, because I expect that is Generational and should be also supported. >> >> The flagless should be removed when the main bugs are fixed. > > test/jdk/jdk/jfr/event/oldobject/TestObjectDescription.java line 44: > >> 42: * @key jfr >> 43: * @requires vm.hasJFR >> 44: * @requires vm.gc != "Z" & vm.gc != "Shenandoah" > > Why did you remove the exclusion of ZGC here and left Shenandoah? Isn't it better to leave the ZGC exclusion here? The ZGC is now generational, and the OldObjectSampler should work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22050#discussion_r1842668288 From lmesnik at openjdk.org Thu Nov 14 18:01:40 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 14 Nov 2024 18:01:40 GMT Subject: Integrated: 8344071: Mark some jdk/jfr/event/oldobject test flagless until they fixed to support all GC In-Reply-To: References: Message-ID: <54iVQAo5_bgl4XSMUMPnsYKQteW4b0iZM3sevl6hvfY=.1e5008e6-883f-462a-a45a-0505b36ae45f@github.com> On Tue, 12 Nov 2024 22:31:18 GMT, Leonid Mesnik wrote: > Tests > jdk/jfr/event/oldobject/TestObjectDescription.java > jdk/jfr/event/oldobject/TestClassLoaderLeak.java > fail with different GCs to get expected events. > > They pass with G1 GC so I don't want to problemlist them. > However, they should test all GCs that support these events. > > Removed ZGC, because I expect that is Generational and should be also supported. > > The flagless should be removed when the main bugs are fixed. This pull request has now been integrated. Changeset: 2cbce1f0 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/2cbce1f0f19a308ce792b530bde0438bfe55531f Stats: 7 lines in 2 files changed: 4 ins; 0 del; 3 mod 8344071: Mark some jdk/jfr/event/oldobject test flagless until they fixed to support all GC Reviewed-by: stefank ------------- PR: https://git.openjdk.org/jdk/pull/22050 From dholmes at openjdk.org Fri Nov 15 04:52:04 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 15 Nov 2024 04:52:04 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b test/hotspot/jtreg/gtest/MetaspaceUtilsGtests.java line 1: This file was reduced to empty but not actually deleted. Can you fix it please. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1843185719 From stefank at openjdk.org Fri Nov 15 08:04:12 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 15 Nov 2024 08:04:12 GMT Subject: RFR: 8344071: Mark some jdk/jfr/event/oldobject test flagless until they fixed to support all GC In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 17:58:07 GMT, Leonid Mesnik wrote: >> test/jdk/jdk/jfr/event/oldobject/TestObjectDescription.java line 44: >> >>> 42: * @key jfr >>> 43: * @requires vm.hasJFR >>> 44: * @requires vm.gc != "Z" & vm.gc != "Shenandoah" >> >> Why did you remove the exclusion of ZGC here and left Shenandoah? Isn't it better to leave the ZGC exclusion here? > > The ZGC is now generational, and the OldObjectSampler should work. I don't know what the generational part has to do with it. The Old part in the name doesn't refer the young and old generation, just that an object is lingering in the system, which it also did for single gen ZGC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22050#discussion_r1843329273 From ihse at openjdk.org Fri Nov 15 12:49:19 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 15 Nov 2024 12:49:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b test/hotspot/jtreg/runtime/FieldLayout/ArrayBaseOffsets.java line 1: > 1: /* This file too suffered the same fate; all contents were removed but the file was not deleted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1843710074 From iwalulya at openjdk.org Fri Nov 15 13:41:13 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 15 Nov 2024 13:41:13 GMT Subject: RFR: 8344302: G1: Refactor G1CMTask::do_marking_step to use smaller wrapper methods Message-ID: Hi all, Please review this refactoring of G1CMTask::do_marking_step, breaking it down into multiple helper methods to improve readability. Testing: Tier-1 ------------- Commit messages: - rename handle_abort - removed debug - Merge branch 'master' into MarkingRestart - simple refactor - pitstop Changes: https://git.openjdk.org/jdk/pull/22147/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22147&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344302 Stats: 432 lines in 2 files changed: 223 ins; 195 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/22147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22147/head:pull/22147 PR: https://git.openjdk.org/jdk/pull/22147 From wkemper at openjdk.org Fri Nov 15 21:35:03 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 15 Nov 2024 21:35:03 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v3] In-Reply-To: References: Message-ID: <9SXRh1N-RjMfm_G4LvQFsuF5_DvRxMCrcCXMOtsAwpM=.2b37fe99-1107-440d-a4cb-468fe415b3be@github.com> > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Prevent uncommit thread from running during GC ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22019/files - new: https://git.openjdk.org/jdk/pull/22019/files/7301871e..997360ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=01-02 Stats: 105 lines in 5 files changed: 89 ins; 4 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/22019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22019/head:pull/22019 PR: https://git.openjdk.org/jdk/pull/22019 From lmesnik at openjdk.org Sat Nov 16 02:53:54 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 16 Nov 2024 02:53:54 GMT Subject: RFR: 8344071: Mark some jdk/jfr/event/oldobject test flagless until they fixed to support all GC In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 07:59:40 GMT, Stefan Karlsson wrote: >> The ZGC is now generational, and the OldObjectSampler should work. > > I don't know what the generational part has to do with it. The Old part in the name doesn't refer the young and old generation, just that an object is lingering in the system, which it also did for single gen ZGC. Thanks, I copied your comments to the bug. It might be needed to run specific tests for Parallel/Serial GC like for ZGC so this will be just G1 only. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22050#discussion_r1844887747 From xpeng at openjdk.org Sat Nov 16 08:22:17 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Sat, 16 Nov 2024 08:22:17 GMT Subject: RFR: 8342041: Test gc/shenandoah/oom/TestClassLoaderLeak.java slow on Windows after JDK-8340490 Message-ID: Fixing the regression on Windows caused by JDK-8340490, it bug is actually caused by difference behavior in `os:: os::elapsed_counter()`, Windows doesn't really high nanoseconds hi-res support, instead nanoseconds it returns the the current value of the performance counter([link](https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter)). Some additional changes are also included in the PR for better performance and throughput. ### Tests - [x] Verify gc/shenandoah/oom/TestClassLoaderLeak.java on Windows, no regression at all(improved slightly by ~1s) - [x] Run test suites hotspot_gc_shenandoah with linux-aarch64-server-fastdebug ------------- Commit messages: - Use os::javaTimeNanos() instead of os::elapsed_counter() - format - os::elapsed_counter() is not always nanoseconds, e.g. Windows - Use os::elapsedTime() - Use os::javaTimeMillis instead of os::elapsed_counter() since we don't really need high-res time - Merge branch 'openjdk:master' into JDK-8342041 - fix - Packer stop waiting whenever budget is replenished - Always take _wait_monitor lock at least once when claim_for_alloc fails - 8342041: Test gc/shenandoah/oom/TestClassLoaderLeak.java slow on Windows after JDK-8340490 Changes: https://git.openjdk.org/jdk/pull/22172/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22172&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342041 Stats: 27 lines in 2 files changed: 9 ins; 9 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/22172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22172/head:pull/22172 PR: https://git.openjdk.org/jdk/pull/22172 From xpeng at openjdk.org Sat Nov 16 08:57:20 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Sat, 16 Nov 2024 08:57:20 GMT Subject: RFR: 8342041: Test gc/shenandoah/oom/TestClassLoaderLeak.java slow on Windows after JDK-8340490 [v2] In-Reply-To: References: Message-ID: > Fixing the regression on Windows caused by JDK-8340490, the bug is actually caused by different behavior in `os:: os::elapsed_counter()` which I wasn't aware of. Windows doesn't have nanosecond hi-res clock support, so instead of nanoseconds it returns the the current value of the performance counter([link](https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter)). > > Some additional changes are also included in the PR for better performance and throughput. > > ### Tests > - [x] Verify gc/shenandoah/oom/TestClassLoaderLeak.java on Windows, no regression at all(improved slightly by ~1s comparing to original impl) > - [x] Run test suites hotspot_gc_shenandoah with linux-aarch64-server-fastdebug Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Revert "8342044: Increase timeout of gc/shenandoah/oom/TestClassLoaderLeak.java" This reverts commit 2c0c65353b2f67bdcd954b4d2c2ae3e9b24d1c22. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22172/files - new: https://git.openjdk.org/jdk/pull/22172/files/64751d8f..d83dfbd4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22172&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22172&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22172/head:pull/22172 PR: https://git.openjdk.org/jdk/pull/22172 From kdnilsen at openjdk.org Sat Nov 16 17:53:53 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 16 Nov 2024 17:53:53 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v3] In-Reply-To: <9SXRh1N-RjMfm_G4LvQFsuF5_DvRxMCrcCXMOtsAwpM=.2b37fe99-1107-440d-a4cb-468fe415b3be@github.com> References: <9SXRh1N-RjMfm_G4LvQFsuF5_DvRxMCrcCXMOtsAwpM=.2b37fe99-1107-440d-a4cb-468fe415b3be@github.com> Message-ID: On Fri, 15 Nov 2024 21:35:03 GMT, William Kemper wrote: >> Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Prevent uncommit thread from running during GC I'm ok with this, but best to wait for @shipilev approval before integrating. ------------- Marked as reviewed by kdnilsen (Author). PR Review: https://git.openjdk.org/jdk/pull/22019#pullrequestreview-2440779513 From kdnilsen at openjdk.org Sat Nov 16 17:53:54 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 16 Nov 2024 17:53:54 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v3] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 19:20:09 GMT, William Kemper wrote: >> So, wait a sec. This code is in `ShenandoahResetBitmapTask`, so it can run in parallel. Putting a lock here inhibits parallelism. I understand the failure mode, but I think we should really be optimizing for the case when `ShenandoahUncommit` is not enabled (e.g. `-Xmx` == `-Xms`). >> >> Sounds like there is a hassle in allowing concurrent uncommit to overlap with the GC cycle. In addition to this particular problem, we might be stealing cycles from the GC threads and take additional TTSP lag to park the uncommitter for the in-cycle GC pauses. I have no clear solution for this yet, but I think we need to explore if we can suspend the uncommit before going into GC cycle... > > We could have the control and uncommit threads coordinate their efforts. In the worst case, it could mean delaying concurrent reset while the control thread waits for the uncommit thread to yield. > > We could also try a more targeted lock only for the region's bitmap slice, but it doesn't seem right that one thread would be trying to clear a bitmap, while the other is trying to uncommit it. A lock could preserve technical correctness, but contention here would just mean that one thread would have wasted its time (either clearing a bitmap that is then uncommitted, or attempting to clear a bitmap that was first uncommitted (in this case, we would need the control thread to detect this and skip the region)). Can we open a ticket to consider future improved concurrency by moving clear_bitmap(region) outside the global heap lock? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1845144702 From xpeng at openjdk.org Sat Nov 16 19:49:20 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Sat, 16 Nov 2024 19:49:20 GMT Subject: RFR: 8342041: Test gc/shenandoah/oom/TestClassLoaderLeak.java slow on Windows after JDK-8340490 [v3] In-Reply-To: References: Message-ID: > Fixing the regression on Windows caused by JDK-8340490, the bug is actually caused by different behavior in `os:: os::elapsed_counter()` which I wasn't aware of. Windows doesn't have nanosecond hi-res clock support, so instead of nanoseconds it returns the the current value of the performance counter([link](https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter)). > > ### Tests > - [x] Verify gc/shenandoah/oom/TestClassLoaderLeak.java on Windows, no regression at all(improved slightly by ~1s comparing to original impl) > - [x] Run test suites hotspot_gc_shenandoah with linux-aarch64-server-fastdebug Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: - Revert all the changes not related to the bug fix - Simplify pace_for_alloc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22172/files - new: https://git.openjdk.org/jdk/pull/22172/files/d83dfbd4..19e782c6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22172&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22172&range=01-02 Stats: 27 lines in 2 files changed: 9 ins; 9 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/22172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22172/head:pull/22172 PR: https://git.openjdk.org/jdk/pull/22172 From shade at openjdk.org Mon Nov 18 12:34:48 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 18 Nov 2024 12:34:48 GMT Subject: RFR: 8342041: Test gc/shenandoah/oom/TestClassLoaderLeak.java slow on Windows after JDK-8340490 [v3] In-Reply-To: References: Message-ID: On Sat, 16 Nov 2024 19:49:20 GMT, Xiaolong Peng wrote: >> Fixing the regression on Windows caused by JDK-8340490, the bug is actually caused by different behavior in `os:: os::elapsed_counter()` which I wasn't aware of. Windows doesn't have nanosecond hi-res clock support, so instead of nanoseconds it returns the the current value of the performance counter([link](https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter)). >> >> ### Tests >> - [x] Verify gc/shenandoah/oom/TestClassLoaderLeak.java on Windows, no regression at all(improved slightly by ~1s comparing to original impl) >> - [x] Run test suites hotspot_gc_shenandoah with linux-aarch64-server-fastdebug > > Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: > > - Revert all the changes not related to the bug fix > - Simplify pace_for_alloc Ouch. So `elapsed_counter` is nanoseconds only on POSIX! My bad for not catching this during the original review in [JDK-8340490](https://bugs.openjdk.org/browse/JDK-8340490). The fix looks good. Attn @kdnilsen, @earthling-amzn. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22172#pullrequestreview-2442489636 PR Comment: https://git.openjdk.org/jdk/pull/22172#issuecomment-2482912074 From jvernee at openjdk.org Mon Nov 18 12:46:57 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 18 Nov 2024 12:46:57 GMT Subject: RFR: 8331735: UpcallLinker::on_exit races with GC when copying frame anchor Message-ID: <4Q4duvOf_UPZkpps6YI0SvRcYYQkG4b7ckTUUTVB494=.9fa7eaec-73fc-43c8-9f69-133630235b2b@github.com> There is a subtle race in `UpcallLinker::on_exit` between copying of the old frame anchor back into place, and the GC. Since this copy is not atomic, it may briefly appear as if a thread has no last Java frame, while still in the `_thread_in_native` state, which leads to the GC skipping processing of any active Java frames. This code was originally adapted from `JavaCallWrapper::!JavaCallWrapper` - the JNI mechanism for upcalls - but in that case the frame anchor copy happens in the `_thread_in_vm` state, which means the GC will wait for the thread to get to a safepoint. The solution proposed here is to do the frame anchor copy in the java thread state, before transitioning back to the native state. The java thread state, like the vm thread state, is also 'safe' i.e. the GC will wait for the thread to get to a safepoint, so we can safely do our non-atomic copy of the frame anchor. Additionally, this PR resolves a similar issue in `on_entry`, by moving the clearing of the pending exception (in case native code use a JNI API and didn't handle the exception afterwards). We now also skip checking for async exceptions when transitioning from native to java, so we don't immediately clear them. Any async exceptions will be picked up at the next safepoint instead. Special thanks to @stefank and @fisk for finding the root cause, and @jaikiran for testing and debugging. Testing: tier 1-4, 20k runs of the failing test on linux-aarch64. ------------- Commit messages: - Merge branch 'master' into SafeFrameAnchor - Merge branch 'master' into SafeFrameAnchor - Don't touch frame anchor or current exception oop in native state Changes: https://git.openjdk.org/jdk/pull/21742/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21742&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331735 Stats: 15 lines in 1 file changed: 5 ins; 9 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21742.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21742/head:pull/21742 PR: https://git.openjdk.org/jdk/pull/21742 From epeter at openjdk.org Mon Nov 18 14:08:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:08:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: <_8-9ZYtVnjHTl3zce1wjZUCJZ6j1I5LgVfmUT4VKkm8=.74799b71-4c26-4c6c-8299-2efd02292548@github.com> On Fri, 8 Nov 2024 17:42:24 GMT, Roman Kennke wrote: >> Could you please cherry pick https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 for the JVMCI support? > > @mur47x111 it's now intergrated in jdk24. do your magic in Graal ;-) @rkennke I have now looked more into the SuperWord collateral damage: [JDK-8340010](https://bugs.openjdk.org/browse/JDK-8340010): Fix vectorization tests with compact headers Do we care about `AlignVector` and `UseCompactObjectHeaders` enabled together? If so, we have a serious issue with mixed type examples. There are actually now some failing cases: Failed IR Rules (3) of Methods (3) ---------------------------------- 1) Method "public char[] compiler.vectorization.runner.ArrayTypeConvertTest.convertFloatToChar()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "avx2", "true"}, counts={"_#V#VECTOR_CAST_F2S#_", "_ at min(max_float, max_char)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(VectorCastF2X.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 2) Method "public short[] compiler.vectorization.runner.ArrayTypeConvertTest.convertFloatToShort()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "avx2", "true"}, counts={"_#V#VECTOR_CAST_F2S#_", "_ at min(max_float, max_short)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(VectorCastF2X.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 3) Method "public float[] compiler.vectorization.runner.ArrayTypeConvertTest.convertShortToFloat()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "avx2", "true"}, counts={"_#V#VECTOR_CAST_S2F#_", "_ at min(max_short, max_float)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(VectorCastS2X.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! Let me explain: If we enable AlignVector, we need 8-byte alignment. As long as `UseCompactObjectHeaders` is disabled, all of these are `=16`: UNSAFE.ARRAY_BYTE_BASE_OFFSET UNSAFE.ARRAY_SHORT_BASE_OFFSET UNSAFE.ARRAY_CHAR_BASE_OFFSET UNSAFE.ARRAY_INT_BASE_OFFSET UNSAFE.ARRAY_LONG_BASE_OFFSET UNSAFE.ARRAY_FLOAT_BASE_OFFSET UNSAFE.ARRAY_DOUBLE_BASE_OFFSET However, with `UseCompactObjectHeaders` endabled, these are now 12: UNSAFE.ARRAY_BYTE_BASE_OFFSET UNSAFE.ARRAY_SHORT_BASE_OFFSET UNSAFE.ARRAY_CHAR_BASE_OFFSET UNSAFE.ARRAY_INT_BASE_OFFSET UNSAFE.ARRAY_FLOAT_BASE_OFFSET And these still 16: UNSAFE.ARRAY_LONG_BASE_OFFSET UNSAFE.ARRAY_DOUBLE_BASE_OFFSET Now let's try to get that 8-byte alignment in some example, one from the above: public short[] convertFloatToShort() { short[] res = new short[SIZE]; for (int i = 0; i < SIZE; i++) { res[i] = (short) floats[i]; } return res; } Let's look at the two addresses with `UseCompactObjectHeaders=false`, where we **can** vectorize: F_adr = base + 16 + 4 * i -> aligned for: i % 2 = 0 S_adr = base + 16 + 2 * i -> aligned for: i % 4 = 0 -> solution for both: i % 4 = 0, i.e. we have alignment for both vector accesses every 4th iteration. Let's look at the two addresses with `UseCompactObjectHeaders=true`, where we **cannot** vectorize: F_adr = base + 12 + 4 * i -> aligned for: i % 2 = 1 S_adr = base + 12 + 2 * i -> aligned for: i % 4 = 2 -> There is no solution to satisfy both alignment constraints! It's a little sad that I only just realized this now... but oh well. The issue is that we apparently did not run testing for these examples, so I did not see the impact immediately. So my question: do we care about `UseCompactObjectHeaders` and `AlignVector` enabled at the same time? If so, we have to accept that some examples with mixed types simply will not vectorize. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483138198 From epeter at openjdk.org Mon Nov 18 14:12:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:12:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 17:42:24 GMT, Roman Kennke wrote: >> Could you please cherry pick https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 for the JVMCI support? > > @mur47x111 it's now intergrated in jdk24. do your magic in Graal ;-) @rkennke How important is the 4-byte saving on `byte, char, short, int, float` arrays? I'd assume they are not generally that small, at least a few elements? So could we make an exception, and have a `16-byte` offset to the payload of all these primitive (and maybe all) arrays, at least under `AlignVector`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483153279 From rkennke at openjdk.org Mon Nov 18 14:16:26 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 18 Nov 2024 14:16:26 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 17:42:24 GMT, Roman Kennke wrote: >> Could you please cherry pick https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 for the JVMCI support? > > @mur47x111 it's now intergrated in jdk24. do your magic in Graal ;-) > @rkennke How important is the 4-byte saving on `byte, char, short, int, float` arrays? I'd assume they are not generally that small, at least a few elements? So could we make an exception, and have a `16-byte` offset to the payload of all these primitive (and maybe all) arrays, at least under `AlignVector`? For byte[] and to some extend for char[] it is quite important, because those are the backing types for String and related classes, and Java apps often have *many* of them, and also quite small. I would not want to to sacrifize them for vectorization, especially not for the relatively uncommon (I think) case of mixed type access. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483162512 From epeter at openjdk.org Mon Nov 18 14:20:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:20:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: <55hlCTAhtpoZT9LDQUkHwPQ5UUTylLzfNDYiFaBTXes=.9d9d6874-2f59-4833-9226-9e7f6410ca8d@github.com> On Mon, 18 Nov 2024 14:13:17 GMT, Roman Kennke wrote: >> @mur47x111 it's now intergrated in jdk24. do your magic in Graal ;-) > >> @rkennke How important is the 4-byte saving on `byte, char, short, int, float` arrays? I'd assume they are not generally that small, at least a few elements? So could we make an exception, and have a `16-byte` offset to the payload of all these primitive (and maybe all) arrays, at least under `AlignVector`? > > For byte[] and to some extend for char[] it is quite important, because those are the backing types for String and related classes, and Java apps often have *many* of them, and also quite small. I would not want to to sacrifize them for vectorization, especially not for the relatively uncommon (I think) case of mixed type access. @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483170957 From epeter at openjdk.org Mon Nov 18 14:28:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:28:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 14:23:24 GMT, Roman Kennke wrote: >>> @rkennke How important is the 4-byte saving on `byte, char, short, int, float` arrays? I'd assume they are not generally that small, at least a few elements? So could we make an exception, and have a `16-byte` offset to the payload of all these primitive (and maybe all) arrays, at least under `AlignVector`? >> >> For byte[] and to some extend for char[] it is quite important, because those are the backing types for String and related classes, and Java apps often have *many* of them, and also quite small. I would not want to to sacrifize them for vectorization, especially not for the relatively uncommon (I think) case of mixed type access. > >> @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. > > BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. > > What is the failure mode, though? When running with -UCOH and +AlignVector, would it crash or misbehave? Or would it (silently?) not vectorize? I think we could live with the latter, but not with the former. @rkennke It just will (silently) not vectorize, thus running slower but still correct. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483202341 From rkennke at openjdk.org Mon Nov 18 14:28:20 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 18 Nov 2024 14:28:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 14:13:17 GMT, Roman Kennke wrote: >> @mur47x111 it's now intergrated in jdk24. do your magic in Graal ;-) > >> @rkennke How important is the 4-byte saving on `byte, char, short, int, float` arrays? I'd assume they are not generally that small, at least a few elements? So could we make an exception, and have a `16-byte` offset to the payload of all these primitive (and maybe all) arrays, at least under `AlignVector`? > > For byte[] and to some extend for char[] it is quite important, because those are the backing types for String and related classes, and Java apps often have *many* of them, and also quite small. I would not want to to sacrifize them for vectorization, especially not for the relatively uncommon (I think) case of mixed type access. > @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. What is the failure mode, though? When running with -UCOH and +AlignVector, would it crash or misbehave? Or would it (silently?) not vectorize? I think we could live with the latter, but not with the former. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483195304 From epeter at openjdk.org Mon Nov 18 14:41:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:41:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> On Mon, 18 Nov 2024 14:23:24 GMT, Roman Kennke wrote: >>> @rkennke How important is the 4-byte saving on `byte, char, short, int, float` arrays? I'd assume they are not generally that small, at least a few elements? So could we make an exception, and have a `16-byte` offset to the payload of all these primitive (and maybe all) arrays, at least under `AlignVector`? >> >> For byte[] and to some extend for char[] it is quite important, because those are the backing types for String and related classes, and Java apps often have *many* of them, and also quite small. I would not want to to sacrifize them for vectorization, especially not for the relatively uncommon (I think) case of mixed type access. > >> @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. > > BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. > > What is the failure mode, though? When running with -UCOH and +AlignVector, would it crash or misbehave? Or would it (silently?) not vectorize? I think we could live with the latter, but not with the former. @rkennke > BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. Sure. But I guess some people will want to run both `AlignVector` and `UseCompactObjectHeaders` in the future. Some machines simply do require strict alignment. So they will have to live with that tradeoff. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483225393 From qamai at openjdk.org Mon Nov 18 14:41:21 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 18 Nov 2024 14:41:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> Message-ID: On Mon, 18 Nov 2024 14:31:52 GMT, Emanuel Peter wrote: >>> @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. >> >> BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. >> >> What is the failure mode, though? When running with -UCOH and +AlignVector, would it crash or misbehave? Or would it (silently?) not vectorize? I think we could live with the latter, but not with the former. > > @rkennke >> BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. > > Sure. But I guess some people will want to run both `AlignVector` and `UseCompactObjectHeaders` in the future. Some machines simply do require strict alignment. So they will have to live with that tradeoff. @eme64 Tbh I don't see how `AlignVector` can mitigate the issue if strict alignment is required unless the object base is guaranteed to be aligned at least as much as the vector length. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483230986 From rkennke at openjdk.org Mon Nov 18 14:41:21 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 18 Nov 2024 14:41:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 14:23:24 GMT, Roman Kennke wrote: >>> @rkennke How important is the 4-byte saving on `byte, char, short, int, float` arrays? I'd assume they are not generally that small, at least a few elements? So could we make an exception, and have a `16-byte` offset to the payload of all these primitive (and maybe all) arrays, at least under `AlignVector`? >> >> For byte[] and to some extend for char[] it is quite important, because those are the backing types for String and related classes, and Java apps often have *many* of them, and also quite small. I would not want to to sacrifize them for vectorization, especially not for the relatively uncommon (I think) case of mixed type access. > >> @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. > > BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. > > What is the failure mode, though? When running with -UCOH and +AlignVector, would it crash or misbehave? Or would it (silently?) not vectorize? I think we could live with the latter, but not with the former. > @rkennke It just will (silently) not vectorize, thus running slower but still correct. Ok, I think we can live with that for now. As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. The tests need fixing, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483234723 From epeter at openjdk.org Mon Nov 18 14:41:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:41:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> Message-ID: On Mon, 18 Nov 2024 14:34:13 GMT, Quan Anh Mai wrote: >> @rkennke >>> BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. >> >> Sure. But I guess some people will want to run both `AlignVector` and `UseCompactObjectHeaders` in the future. Some machines simply do require strict alignment. So they will have to live with that tradeoff. > > @eme64 Tbh I don't see how `AlignVector` can mitigate the issue if strict alignment is required unless the object base is guaranteed to be aligned at least as much as the vector length. @merykitty the object base is always at least `8-byte` aligned, see `ObjectAlignmentInBytes` - this also holds for all arrays. But the issue is the offset from the object base to the array payload. @rkennke yes, working on fixing the tests :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483236250 From qamai at openjdk.org Mon Nov 18 14:41:21 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 18 Nov 2024 14:41:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> Message-ID: On Mon, 18 Nov 2024 14:36:17 GMT, Emanuel Peter wrote: >> @eme64 Tbh I don't see how `AlignVector` can mitigate the issue if strict alignment is required unless the object base is guaranteed to be aligned at least as much as the vector length. > > @merykitty the object base is always at least `8-byte` aligned, see `ObjectAlignmentInBytes` - this also holds for all arrays. But the issue is the offset from the object base to the array payload. > > @rkennke yes, working on fixing the tests :) @eme64 Please correct me if I'm wrong but the issue is you need the base to be aligned at 32 bytes on AVX2 machines for any alignment for vector instruction to be meaningful, so I don't see the value of vector alignment at all. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483241445 From epeter at openjdk.org Mon Nov 18 14:41:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:41:22 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 14:35:41 GMT, Roman Kennke wrote: >>> @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. >> >> BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. >> >> What is the failure mode, though? When running with -UCOH and +AlignVector, would it crash or misbehave? Or would it (silently?) not vectorize? I think we could live with the latter, but not with the former. > >> @rkennke It just will (silently) not vectorize, thus running slower but still correct. > > Ok, I think we can live with that for now. > > As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. > > The tests need fixing, though. @rkennke > As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. Ah. So we would eventually have not a `12-byte` but `8-byte` offset from base to payload? Would that happen in all cases? And could that happen before `UseCompactObjectHeaders` leaves experimental status? Because it is going to be a little annoying to adjust all vectorization tests for the special case of `UseCompactObjectHeaders + AlignVector`. Though I can surely do it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483242899 From epeter at openjdk.org Mon Nov 18 14:44:19 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:44:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> Message-ID: On Mon, 18 Nov 2024 14:38:20 GMT, Quan Anh Mai wrote: >> @merykitty the object base is always at least `8-byte` aligned, see `ObjectAlignmentInBytes` - this also holds for all arrays. But the issue is the offset from the object base to the array payload. >> >> @rkennke yes, working on fixing the tests :) > > @eme64 Please correct me if I'm wrong but the issue is you need the base to be aligned at 32 bytes on AVX2 machines for any alignment for vector instruction to be meaningful, so I don't see the value of vector alignment at all. @merykitty > Please correct me if I'm wrong but the issue is you need the base to be aligned at 32 bytes on AVX2 machines for any alignment for vector instruction to be meaningful, so I don't see the value of vector alignment at all. First: without `AlignVector`, the vector instructions can have completely free alignment. On x64 and aarch64 generally I think most machines do not need alignment at all. And as far as I know there is also no performance penalty on modern CPUs for misalignment. I could be wrong here. On older CPUs alignment was important for performance though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483249163 From qamai at openjdk.org Mon Nov 18 14:56:28 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 18 Nov 2024 14:56:28 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> Message-ID: On Mon, 18 Nov 2024 14:41:25 GMT, Emanuel Peter wrote: >> @eme64 Please correct me if I'm wrong but the issue is you need the base to be aligned at 32 bytes on AVX2 machines for any alignment for vector instruction to be meaningful, so I don't see the value of vector alignment at all. > > @merykitty >> Please correct me if I'm wrong but the issue is you need the base to be aligned at 32 bytes on AVX2 machines for any alignment for vector instruction to be meaningful, so I don't see the value of vector alignment at all. > > First: without `AlignVector`, the vector instructions can have completely free alignment. On x64 and aarch64 generally I think most machines do not need alignment at all. And as far as I know there is also no performance penalty on modern CPUs for misalignment. I could be wrong here. On older CPUs alignment was important for performance though. @eme64 You will need the alignment for the whole vector (which means 32 bytes for a `ymm` load), not alignment only on its elements. Vector element is the artefact of ALU units, not the load/store units that actually care about alignment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483255086 From epeter at openjdk.org Mon Nov 18 14:56:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:56:28 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> Message-ID: <7h8Il7V3a1tbo_U2y2GyUY2tH8UPXtKc3we3ZZi47d4=.4a4cbe88-92a5-43cf-a6a9-48d0bed41cf7@github.com> On Mon, 18 Nov 2024 14:43:48 GMT, Quan Anh Mai wrote: >> @merykitty >>> Please correct me if I'm wrong but the issue is you need the base to be aligned at 32 bytes on AVX2 machines for any alignment for vector instruction to be meaningful, so I don't see the value of vector alignment at all. >> >> First: without `AlignVector`, the vector instructions can have completely free alignment. On x64 and aarch64 generally I think most machines do not need alignment at all. And as far as I know there is also no performance penalty on modern CPUs for misalignment. I could be wrong here. On older CPUs alignment was important for performance though. > > @eme64 You will need the alignment for the whole vector (which means 32 bytes for a `ymm` load), not alignment only on its elements. Vector element is the artefact of ALU units, not the load/store units that actually care about alignment. @merykitty I don't think I understand. When and for what do I need the full 32-byte alignment? @merykitty In `AlignmentSolver::solve` / `src/hotspot/share/opto/vectorization.cpp` you can see how I compute if vectors can be aligned. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483261148 PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483266962 From qamai at openjdk.org Mon Nov 18 14:56:28 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 18 Nov 2024 14:56:28 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: <7h8Il7V3a1tbo_U2y2GyUY2tH8UPXtKc3we3ZZi47d4=.4a4cbe88-92a5-43cf-a6a9-48d0bed41cf7@github.com> References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> <7h8Il7V3a1tbo_U2y2GyUY2tH8UPXtKc3we3ZZi47d4=.4a4cbe88-92a5-43cf-a6a9-48d0bed41cf7@github.com> Message-ID: <6rwCNBLV4-VemVsKR8KWYEgSIKfHQxS_RuxsPwX7TZo=.5fe167a3-1f97-408d-9d41-23d4d0fb42df@github.com> On Mon, 18 Nov 2024 14:48:22 GMT, Emanuel Peter wrote: >> @eme64 You will need the alignment for the whole vector (which means 32 bytes for a `ymm` load), not alignment only on its elements. Vector element is the artefact of ALU units, not the load/store units that actually care about alignment. > > @merykitty In `AlignmentSolver::solve` / `src/hotspot/share/opto/vectorization.cpp` you can see how I compute if vectors can be aligned. @eme64 If you load a 32-byte (256-bit) vector, then the load is aligned if the address is divisible by 32, otherwise the load is misaligned. That's why [`vmovdqua`](https://www.felixcloutier.com/x86/movdqa:vmovdqa32:vmovdqa64) requires 16-byte alignment for 16-byte loads/stores, 32-byte alignment for 32-byte loads/stores, 64-byte alignment for 64-byte loads/stores. As a result, I don't see how you can align a vector load/store if the object base is only guaranteed to align at 8-byte boundaries. I mean there is no use trying to align an access if you cannot align it at the access size, the access is going to be misaligned anyway. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483275575 From rkennke at openjdk.org Mon Nov 18 15:04:25 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 18 Nov 2024 15:04:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 14:35:41 GMT, Roman Kennke wrote: >>> @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. >> >> BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. >> >> What is the failure mode, though? When running with -UCOH and +AlignVector, would it crash or misbehave? Or would it (silently?) not vectorize? I think we could live with the latter, but not with the former. > >> @rkennke It just will (silently) not vectorize, thus running slower but still correct. > > Ok, I think we can live with that for now. > > As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. > > The tests need fixing, though. > @rkennke > > > As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. > > Ah. So we would eventually have not a `12-byte` but `8-byte` offset from base to payload? Would that happen in all cases? And could that happen before `UseCompactObjectHeaders` leaves experimental status? Because it is going to be a little annoying to adjust all vectorization tests for the special case of `UseCompactObjectHeaders + AlignVector`. Though I can surely do it. I am not sure if and when this is going to happen. When I presented the idea at JVMLS, I got some resistance. I am also not sure if we first leave experimental status for UCOH, and then introduce 4-byte headers under a new flag (or no flag?), or if we first do 4-byte headers and only leave experimental status once that is done. The latter sounds more reasonable to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483304257 From epeter at openjdk.org Mon Nov 18 15:04:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 15:04:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: <6rwCNBLV4-VemVsKR8KWYEgSIKfHQxS_RuxsPwX7TZo=.5fe167a3-1f97-408d-9d41-23d4d0fb42df@github.com> References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> <7h8Il7V3a1tbo_U2y2GyUY2tH8UPXtKc3we3ZZi47d4=.4a4cbe88-92a5-43cf-a6a9-48d0bed41cf7@github.com> <6rwCNBLV4-VemVsKR8KWYEgSIKfHQxS_RuxsPwX7TZo=.5fe167a3-1f97-408d-9d41-23d4d0fb42df@github.com> Message-ID: On Mon, 18 Nov 2024 14:50:51 GMT, Quan Anh Mai wrote: >> @merykitty In `AlignmentSolver::solve` / `src/hotspot/share/opto/vectorization.cpp` you can see how I compute if vectors can be aligned. > > @eme64 If you load a 32-byte (256-bit) vector, then the load is aligned if the address is divisible by 32, otherwise the load is misaligned. That's why [`vmovdqua`](https://www.felixcloutier.com/x86/movdqa:vmovdqa32:vmovdqa64) requires 16-byte alignment for 16-byte loads/stores, 32-byte alignment for 32-byte loads/stores, 64-byte alignment for 64-byte loads/stores. > > As a result, I don't see how you can align a vector load/store if the object base is only guaranteed to align at 8-byte boundaries. I mean there is no use trying to align an access if you cannot align it at the access size, the access is going to be misaligned anyway. @merykitty I guess we can always use [vmovdqu](https://www.felixcloutier.com/x86/movdqu:vmovdqu8:vmovdqu16:vmovdqu32:vmovdqu64). And in fact that is exactly what we do: public class Test { static int RANGE = 1024*1024; public static void main(String[] args) { byte[] aB = new byte[RANGE]; byte[] bB = new byte[RANGE]; for (int i = 0; i < 100_000; i++) { test1(aB, bB); } } static void test1(byte[] a, byte[] b) { for (int i = 0; i < RANGE; i++) { a[i] = b[i]; } } } `../java -XX:CompileCommand=compileonly,Test::test* -XX:CompileCommand=printcompilation,Test::test* -XX:+TraceLoopOpts -XX:-TraceSuperWord -XX:+TraceNewVectors -Xbatch -XX:+AlignVector -XX:CompileCommand=compileonly,Test::test* -XX:CompileCommand=printassembly,Test::test* Test.java` ;; B20: # out( B20 B21 ) <- in( B19 B20 ) Loop( B20-B20 inner main of N178 strip mined) Freq: 8.13586e+09 0x00007fc3a4bb0780: movslq %ebx,%rdi 0x00007fc3a4bb0783: movslq %ebx,%r14 0x00007fc3a4bb0786: vmovdqu32 0x10(%r13,%r14,1),%zmm1 0x00007fc3a4bb0791: vmovdqu32 %zmm1,0x10(%r9,%r14,1) 0x00007fc3a4bb079c: vmovdqu32 0x50(%r13,%rdi,1),%zmm1 0x00007fc3a4bb07a7: vmovdqu32 %zmm1,0x50(%r9,%rdi,1) 0x00007fc3a4bb07b2: vmovdqu32 0x90(%r13,%rdi,1),%zmm1 0x00007fc3a4bb07bd: vmovdqu32 %zmm1,0x90(%r9,%rdi,1) 0x00007fc3a4bb07c8: vmovdqu32 0xd0(%r13,%rdi,1),%zmm1 0x00007fc3a4bb07d3: vmovdqu32 %zmm1,0xd0(%r9,%rdi,1) 0x00007fc3a4bb07de: vmovdqu32 0x110(%r13,%rdi,1),%zmm1 0x00007fc3a4bb07e9: vmovdqu32 %zmm1,0x110(%r9,%rdi,1) 0x00007fc3a4bb07f4: vmovdqu32 0x150(%r13,%rdi,1),%zmm1 0x00007fc3a4bb07ff: vmovdqu32 %zmm1,0x150(%r9,%rdi,1) 0x00007fc3a4bb080a: vmovdqu32 0x190(%r13,%rdi,1),%zmm1 0x00007fc3a4bb0815: vmovdqu32 %zmm1,0x190(%r9,%rdi,1) 0x00007fc3a4bb0820: vmovdqu32 0x1d0(%r13,%rdi,1),%zmm1 0x00007fc3a4bb082b: vmovdqu32 %zmm1,0x1d0(%r9,%rdi,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} ; - Test::test1 at 14 (line 14) 0x00007fc3a4bb0836: add $0x200,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - Test::test1 at 15 (line 13) 0x00007fc3a4bb083c: cmp %r11d,%ebx 0x00007fc3a4bb083f: jl 0x00007fc3a4bb0780 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483305049 From qamai at openjdk.org Mon Nov 18 15:23:34 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 18 Nov 2024 15:23:34 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> <7h8Il7V3a1tbo_U2y2GyUY2tH8UPXtKc3we3ZZi47d4=.4a4cbe88-92a5-43cf-a6a9-48d0bed41cf7@github.com> <6rwCNBLV4-VemVsKR8KWYEgSIKfHQxS_RuxsPwX7TZo=.5fe167a3-1f97-408d-9d41-23d4d0fb42df@github.com> Message-ID: On Mon, 18 Nov 2024 15:01:09 GMT, Emanuel Peter wrote: >> @eme64 If you load a 32-byte (256-bit) vector, then the load is aligned if the address is divisible by 32, otherwise the load is misaligned. That's why [`vmovdqua`](https://www.felixcloutier.com/x86/movdqa:vmovdqa32:vmovdqa64) requires 16-byte alignment for 16-byte loads/stores, 32-byte alignment for 32-byte loads/stores, 64-byte alignment for 64-byte loads/stores. >> >> As a result, I don't see how you can align a vector load/store if the object base is only guaranteed to align at 8-byte boundaries. I mean there is no use trying to align an access if you cannot align it at the access size, the access is going to be misaligned anyway. > > @merykitty I guess we can always use [vmovdqu](https://www.felixcloutier.com/x86/movdqu:vmovdqu8:vmovdqu16:vmovdqu32:vmovdqu64). > > And in fact that is exactly what we do: > > public class Test { > static int RANGE = 1024*1024; > > public static void main(String[] args) { > byte[] aB = new byte[RANGE]; > byte[] bB = new byte[RANGE]; > for (int i = 0; i < 100_000; i++) { > test1(aB, bB); > } > } > > static void test1(byte[] a, byte[] b) { > for (int i = 0; i < RANGE; i++) { > a[i] = b[i]; > } > } > } > > `../java -XX:CompileCommand=compileonly,Test::test* -XX:CompileCommand=printcompilation,Test::test* -XX:+TraceLoopOpts -XX:-TraceSuperWord -XX:+TraceNewVectors -Xbatch -XX:+AlignVector -XX:CompileCommand=compileonly,Test::test* -XX:CompileCommand=printassembly,Test::test* Test.java` > > > ;; B20: # out( B20 B21 ) <- in( B19 B20 ) Loop( B20-B20 inner main of N178 strip mined) Freq: 8.13586e+09 > 0x00007fc3a4bb0780: movslq %ebx,%rdi > 0x00007fc3a4bb0783: movslq %ebx,%r14 > 0x00007fc3a4bb0786: vmovdqu32 0x10(%r13,%r14,1),%zmm1 > 0x00007fc3a4bb0791: vmovdqu32 %zmm1,0x10(%r9,%r14,1) > 0x00007fc3a4bb079c: vmovdqu32 0x50(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb07a7: vmovdqu32 %zmm1,0x50(%r9,%rdi,1) > 0x00007fc3a4bb07b2: vmovdqu32 0x90(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb07bd: vmovdqu32 %zmm1,0x90(%r9,%rdi,1) > 0x00007fc3a4bb07c8: vmovdqu32 0xd0(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb07d3: vmovdqu32 %zmm1,0xd0(%r9,%rdi,1) > 0x00007fc3a4bb07de: vmovdqu32 0x110(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb07e9: vmovdqu32 %zmm1,0x110(%r9,%rdi,1) > 0x00007fc3a4bb07f4: vmovdqu32 0x150(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb07ff: vmovdqu32 %zmm1,0x150(%r9,%rdi,1) > 0x00007fc3a4bb080a: vmovdqu32 0x190(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb0815: vmovdqu32 %zmm1,0x190(%r9,%rdi,1) > 0x00007fc3a4bb0820: vmovdqu32 0x1d0(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb082b: vmovdqu32 %zmm1,0x1d0(%r9,%rdi,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test1 at 14 (line 14) > 0x00007fc3a4bb0836: add $0x200,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test1 at 15 (line 13) > 0x00007fc3a4bb083c: c... @eme64 What I mean here is that `AlignVector` seems useless because the accesses are going to be misaligned either way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483356306 From ihse at openjdk.org Mon Nov 18 15:32:21 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 18 Nov 2024 15:32:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 04:49:51 GMT, David Holmes wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: >> >> - Merge branch 'master' into JDK-8305895-v4 >> - Merge tag 'jdk-25+23' into JDK-8305895-v4 >> >> Added tag jdk-24+23 for changeset c0e6c3b9 >> - Fix gen-ZGC removal >> - Merge tag 'jdk-24+22' into JDK-8305895-v4 >> >> Added tag jdk-24+22 for changeset 388d44fb >> - Enable riscv in CompressedClassPointersEncodingScheme test >> - s390 port >> - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test >> - Update copyright >> - Avoid assert/endless-loop in JFR code >> - Update copyright headers >> - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b > > test/hotspot/jtreg/gtest/MetaspaceUtilsGtests.java line 1: > > > This file was reduced to empty but not actually deleted. Can you fix it please. @rkennke Just making sure this is not being missed. Can you please open a JBS issue to correct this and the file below? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1846790097 From epeter at openjdk.org Mon Nov 18 16:20:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 16:20:34 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> <7h8Il7V3a1tbo_U2y2GyUY2tH8UPXtKc3we3ZZi47d4=.4a4cbe88-92a5-43cf-a6a9-48d0bed41cf7@github.com> <6rwCNBLV4-VemVsKR8KWYEgSIKfHQxS_RuxsPwX7TZo=.5fe167a3-1f97-408d-9d41-23d4d0fb42df@github.com> Message-ID: <-uhyD7i_oXhrCIMqAvFf7nt6DsjM6OY-_erP6UDAitg=.bb94ed2c-75f1-4d7a-b45a-113a5886a268@github.com> On Mon, 18 Nov 2024 15:20:17 GMT, Quan Anh Mai wrote: >> @merykitty I guess we can always use [vmovdqu](https://www.felixcloutier.com/x86/movdqu:vmovdqu8:vmovdqu16:vmovdqu32:vmovdqu64). >> >> And in fact that is exactly what we do: >> >> public class Test { >> static int RANGE = 1024*1024; >> >> public static void main(String[] args) { >> byte[] aB = new byte[RANGE]; >> byte[] bB = new byte[RANGE]; >> for (int i = 0; i < 100_000; i++) { >> test1(aB, bB); >> } >> } >> >> static void test1(byte[] a, byte[] b) { >> for (int i = 0; i < RANGE; i++) { >> a[i] = b[i]; >> } >> } >> } >> >> `../java -XX:CompileCommand=compileonly,Test::test* -XX:CompileCommand=printcompilation,Test::test* -XX:+TraceLoopOpts -XX:-TraceSuperWord -XX:+TraceNewVectors -Xbatch -XX:+AlignVector -XX:CompileCommand=compileonly,Test::test* -XX:CompileCommand=printassembly,Test::test* Test.java` >> >> >> ;; B20: # out( B20 B21 ) <- in( B19 B20 ) Loop( B20-B20 inner main of N178 strip mined) Freq: 8.13586e+09 >> 0x00007fc3a4bb0780: movslq %ebx,%rdi >> 0x00007fc3a4bb0783: movslq %ebx,%r14 >> 0x00007fc3a4bb0786: vmovdqu32 0x10(%r13,%r14,1),%zmm1 >> 0x00007fc3a4bb0791: vmovdqu32 %zmm1,0x10(%r9,%r14,1) >> 0x00007fc3a4bb079c: vmovdqu32 0x50(%r13,%rdi,1),%zmm1 >> 0x00007fc3a4bb07a7: vmovdqu32 %zmm1,0x50(%r9,%rdi,1) >> 0x00007fc3a4bb07b2: vmovdqu32 0x90(%r13,%rdi,1),%zmm1 >> 0x00007fc3a4bb07bd: vmovdqu32 %zmm1,0x90(%r9,%rdi,1) >> 0x00007fc3a4bb07c8: vmovdqu32 0xd0(%r13,%rdi,1),%zmm1 >> 0x00007fc3a4bb07d3: vmovdqu32 %zmm1,0xd0(%r9,%rdi,1) >> 0x00007fc3a4bb07de: vmovdqu32 0x110(%r13,%rdi,1),%zmm1 >> 0x00007fc3a4bb07e9: vmovdqu32 %zmm1,0x110(%r9,%rdi,1) >> 0x00007fc3a4bb07f4: vmovdqu32 0x150(%r13,%rdi,1),%zmm1 >> 0x00007fc3a4bb07ff: vmovdqu32 %zmm1,0x150(%r9,%rdi,1) >> 0x00007fc3a4bb080a: vmovdqu32 0x190(%r13,%rdi,1),%zmm1 >> 0x00007fc3a4bb0815: vmovdqu32 %zmm1,0x190(%r9,%rdi,1) >> 0x00007fc3a4bb0820: vmovdqu32 0x1d0(%r13,%rdi,1),%zmm1 >> 0x00007fc3a4bb082b: vmovdqu32 %zmm1,0x1d0(%r9,%rdi,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test1 at 14 (line 14) >> 0x00007fc3a4bb0836: add $0x200,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} >> ... > > @eme64 What I mean here is that `AlignVector` seems useless because the accesses are going to be misaligned either way. @merykitty FYI: `src/hotspot/share/opto/vectorization.hpp: static bool vectors_should_be_aligned() { return !Matcher::misaligned_vectors_ok() || AlignVector; }` The relevant code: src/hotspot/cpu/x86/matcher_x86.hpp: static constexpr bool misaligned_vectors_ok() { // x86 supports misaligned vectors store/load. static constexpr bool misaligned_vectors_ok() { return true; } src/hotspot/cpu/ppc/matcher_ppc.hpp: static constexpr bool misaligned_vectors_ok() { // PPC implementation uses VSX load/store instructions (if // SuperwordUseVSX) which support 4 byte but not arbitrary alignment static constexpr bool misaligned_vectors_ok() { return false; } src/hotspot/cpu/aarch64/matcher_aarch64.hpp: static constexpr bool misaligned_vectors_ok() { // aarch64 supports misaligned vectors store/load. static constexpr bool misaligned_vectors_ok() { return true; } src/hotspot/cpu/s390/matcher_s390.hpp: static constexpr bool misaligned_vectors_ok() { // z/Architecture does support misaligned store/load at minimal extra cost. static constexpr bool misaligned_vectors_ok() { return true; } src/hotspot/cpu/arm/matcher_arm.hpp: static constexpr bool misaligned_vectors_ok() { // ARM doesn't support misaligned vectors store/load. static constexpr bool misaligned_vectors_ok() { return false; } src/hotspot/cpu/riscv/matcher_riscv.hpp: static constexpr bool misaligned_vectors_ok() { // riscv supports misaligned vectors store/load. static constexpr bool misaligned_vectors_ok() { return true; } We can see that only PPC and ARM32 have such strict alignment requirements. And it turns out that PPC only needs 4-byte alignment, and ARM32 is fine with 8-byte alignment. So all of our platforms do not necessarily need full vector-width alignment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483505834 From epeter at openjdk.org Mon Nov 18 16:32:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 16:32:28 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Ah there are some exceptions: x86: `src/hotspot/cpu/x86/vm_version_x86.cpp: AlignVector = !UseUnalignedLoadStores;` if (supports_sse4_2()) { // new ZX cpus if (FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { UseUnalignedLoadStores = true; // use movdqu on newest ZX cpus } } So I suppose some older platforms may be affected, though I have not seen one yet. They would have to be missing the unaligned `movdqu` instructions. aarch64: `src/hotspot/cpu/aarch64/vm_version_aarch64.cpp: AlignVector = AvoidUnalignedAccesses;` // Ampere eMAG if (_cpu == CPU_AMCC && (_model == CPU_MODEL_EMAG) && (_variant == 0x3)) { if (FLAG_IS_DEFAULT(AvoidUnalignedAccesses)) { FLAG_SET_DEFAULT(AvoidUnalignedAccesses, true); } and // ThunderX if (_cpu == CPU_CAVIUM && (_model == 0xA1)) { guarantee(_variant != 0, "Pre-release hardware no longer supported."); if (FLAG_IS_DEFAULT(AvoidUnalignedAccesses)) { FLAG_SET_DEFAULT(AvoidUnalignedAccesses, true); } and // ThunderX2 if ((_cpu == CPU_CAVIUM && (_model == 0xAF)) || (_cpu == CPU_BROADCOM && (_model == 0x516))) { if (FLAG_IS_DEFAULT(AvoidUnalignedAccesses)) { FLAG_SET_DEFAULT(AvoidUnalignedAccesses, true); } and // HiSilicon TSV110 if (_cpu == CPU_HISILICON && _model == 0xd01) { if (FLAG_IS_DEFAULT(AvoidUnalignedAccesses)) { FLAG_SET_DEFAULT(AvoidUnalignedAccesses, true); } So yes, some platforms are affected. But they seem to be the exception. And again: we have only had `ObjectAlignmentInBytes=8` alignment for vectors since forever - and no platform vendor has ever complained about that. Arrays never had a stronger alignment guarantee than that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483528037 PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483531916 From epeter at openjdk.org Mon Nov 18 16:52:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 16:52:22 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 15:00:51 GMT, Roman Kennke wrote: >>> @rkennke It just will (silently) not vectorize, thus running slower but still correct. >> >> Ok, I think we can live with that for now. >> >> As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. >> >> The tests need fixing, though. > >> @rkennke >> >> > As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. >> >> Ah. So we would eventually have not a `12-byte` but `8-byte` offset from base to payload? Would that happen in all cases? And could that happen before `UseCompactObjectHeaders` leaves experimental status? Because it is going to be a little annoying to adjust all vectorization tests for the special case of `UseCompactObjectHeaders + AlignVector`. Though I can surely do it. > > I am not sure if and when this is going to happen. When I presented the idea at JVMLS, I got some resistance. I am also not sure if we first leave experimental status for UCOH, and then introduce 4-byte headers under a new flag (or no flag?), or if we first do 4-byte headers and only leave experimental status once that is done. The latter sounds more reasonable to me. @rkennke Filed a bug to track this (we may close it as NotAnIssue, but this way people are aware / can find the analysis): [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424): C2 SuperWord: mixed type loops do not vectorize with UseCompactObjectHeaders and AlignVector ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483579571 From rkennke at openjdk.org Mon Nov 18 17:00:24 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 18 Nov 2024 17:00:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 15:30:14 GMT, Magnus Ihse Bursie wrote: >> test/hotspot/jtreg/gtest/MetaspaceUtilsGtests.java line 1: >> >> >> This file was reduced to empty but not actually deleted. Can you fix it please. > > @rkennke Just making sure this is not being missed. Can you please open a JBS issue to correct this and the file below? I filed: https://bugs.openjdk.org/browse/JDK-8344425 @tstuefe is working on it (mostly checking that nothing important has been removed) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1846945329 From rkennke at openjdk.org Mon Nov 18 17:09:24 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 18 Nov 2024 17:09:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 15:00:51 GMT, Roman Kennke wrote: >>> @rkennke It just will (silently) not vectorize, thus running slower but still correct. >> >> Ok, I think we can live with that for now. >> >> As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. >> >> The tests need fixing, though. > >> @rkennke >> >> > As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. >> >> Ah. So we would eventually have not a `12-byte` but `8-byte` offset from base to payload? Would that happen in all cases? And could that happen before `UseCompactObjectHeaders` leaves experimental status? Because it is going to be a little annoying to adjust all vectorization tests for the special case of `UseCompactObjectHeaders + AlignVector`. Though I can surely do it. > > I am not sure if and when this is going to happen. When I presented the idea at JVMLS, I got some resistance. I am also not sure if we first leave experimental status for UCOH, and then introduce 4-byte headers under a new flag (or no flag?), or if we first do 4-byte headers and only leave experimental status once that is done. The latter sounds more reasonable to me. > @rkennke Filed a bug to track this (we may close it as NotAnIssue, but this way people are aware / can find the analysis): [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424): C2 SuperWord: mixed type loops do not vectorize with UseCompactObjectHeaders and AlignVector Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483619681 From kdnilsen at openjdk.org Mon Nov 18 23:21:54 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 18 Nov 2024 23:21:54 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v3] In-Reply-To: <9SXRh1N-RjMfm_G4LvQFsuF5_DvRxMCrcCXMOtsAwpM=.2b37fe99-1107-440d-a4cb-468fe415b3be@github.com> References: <9SXRh1N-RjMfm_G4LvQFsuF5_DvRxMCrcCXMOtsAwpM=.2b37fe99-1107-440d-a4cb-468fe415b3be@github.com> Message-ID: On Fri, 15 Nov 2024 21:35:03 GMT, William Kemper wrote: >> Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Prevent uncommit thread from running during GC Have read through the latest version of the code. Thanks. ------------- Marked as reviewed by kdnilsen (Author). PR Review: https://git.openjdk.org/jdk/pull/22019#pullrequestreview-2443969990 From ysr at openjdk.org Tue Nov 19 02:05:55 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 19 Nov 2024 02:05:55 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v3] In-Reply-To: <9SXRh1N-RjMfm_G4LvQFsuF5_DvRxMCrcCXMOtsAwpM=.2b37fe99-1107-440d-a4cb-468fe415b3be@github.com> References: <9SXRh1N-RjMfm_G4LvQFsuF5_DvRxMCrcCXMOtsAwpM=.2b37fe99-1107-440d-a4cb-468fe415b3be@github.com> Message-ID: <2aAK-4vQxs3-P-cscvtJqOwn40B2aduezOZkQIKK-BY=.f1b9a066-63ef-4fe3-bb3d-2efcfac2b6af@github.com> On Fri, 15 Nov 2024 21:35:03 GMT, William Kemper wrote: >> Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Prevent uncommit thread from running during GC Looks good to me. A few documentation comment requests. Also please share performance data in this PR or in the ticket, especially from the perf/benchmark that may have precipitated this change. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 214: > 212: // > 213: public: > 214: void notify_heap_changed(); Let's place a single line of documentation comment for all the public and private APIs at lines that we touch in a PR where documentation is missing. (I realize you merely changed the method from private to public.) src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 406: > 404: void notify_soft_max_changed(); > 405: void notify_explicit_gc_requested(); > 406: 1-line documentation of API. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 411: > 409: private: > 410: ShenandoahControlThread* _control_thread; > 411: ShenandoahUncommitThread* _uncommit_thread; Role of thread, e.g. .... // a thread to uncommit selected free regions of the heap src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 628: > 626: bool is_uncommit_in_progress(); > 627: #endif > 628: 1-line API documentation each. src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.cpp line 87: > 85: // Determine if there is work to do. This avoids taking heap lock if there is > 86: // no work available, avoids spamming logs with superfluous logging messages, > 87: // and minimises the amount of work while locks are taken. last word: taken -> held. src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.hpp line 40: > 38: ShenandoahSharedFlag _uncommit_in_progress; > 39: Monitor _stop_lock; > 40: Monitor _uncommit_lock; A 1-line comment on role of each field, e.g. ShenandoahSharedFlag _soft_max_changed; // the heap's soft max target has changed recently etc. src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.hpp line 46: > 44: void uncommit(double shrink_before, size_t shrink_until); > 45: > 46: bool is_uncommit_allowed(); Would be nice to document these private methods as well. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22019#pullrequestreview-2444049683 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1847475867 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1847477020 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1847479562 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1847480153 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1847516208 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1847519410 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1847508283 From jpai at openjdk.org Tue Nov 19 05:31:47 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Tue, 19 Nov 2024 05:31:47 GMT Subject: RFR: 8331735: UpcallLinker::on_exit races with GC when copying frame anchor In-Reply-To: <4Q4duvOf_UPZkpps6YI0SvRcYYQkG4b7ckTUUTVB494=.9fa7eaec-73fc-43c8-9f69-133630235b2b@github.com> References: <4Q4duvOf_UPZkpps6YI0SvRcYYQkG4b7ckTUUTVB494=.9fa7eaec-73fc-43c8-9f69-133630235b2b@github.com> Message-ID: On Mon, 28 Oct 2024 13:53:58 GMT, Jorn Vernee wrote: > There is a subtle race in `UpcallLinker::on_exit` between copying of the old frame anchor back into place, and the GC. Since this copy is not atomic, it may briefly appear as if a thread has no last Java frame, while still in the `_thread_in_native` state, which leads to the GC skipping processing of any active Java frames. > > This code was originally adapted from `JavaCallWrapper::!JavaCallWrapper` - the JNI mechanism for upcalls - but in that case the frame anchor copy happens in the `_thread_in_vm` state, which means the GC will wait for the thread to get to a safepoint. > > The solution proposed here is to do the frame anchor copy in the java thread state, before transitioning back to the native state. The java thread state, like the vm thread state, is also 'safe' i.e. the GC will wait for the thread to get to a safepoint, so we can safely do our non-atomic copy of the frame anchor. > > Additionally, this PR resolves a similar issue in `on_entry`, by moving the clearing of the pending exception (in case native code use a JNI API and didn't handle the exception afterwards). We now also skip checking for async exceptions when transitioning from native to java, so we don't immediately clear them. Any async exceptions will be picked up at the next safepoint instead. > > Special thanks to @stefank and @fisk for finding the root cause, and @jaikiran for testing and debugging. > > Testing: tier 1-4, 20k runs of the failing test on linux-aarch64. Happy to see this addressed and as Jorn noted, thanks to Stefan and Erik for finding the root cause of this issue which was hard to reproduce and debug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21742#issuecomment-2484739822 From dholmes at openjdk.org Tue Nov 19 06:49:46 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 19 Nov 2024 06:49:46 GMT Subject: RFR: 8331735: UpcallLinker::on_exit races with GC when copying frame anchor In-Reply-To: <4Q4duvOf_UPZkpps6YI0SvRcYYQkG4b7ckTUUTVB494=.9fa7eaec-73fc-43c8-9f69-133630235b2b@github.com> References: <4Q4duvOf_UPZkpps6YI0SvRcYYQkG4b7ckTUUTVB494=.9fa7eaec-73fc-43c8-9f69-133630235b2b@github.com> Message-ID: On Mon, 28 Oct 2024 13:53:58 GMT, Jorn Vernee wrote: > There is a subtle race in `UpcallLinker::on_exit` between copying of the old frame anchor back into place, and the GC. Since this copy is not atomic, it may briefly appear as if a thread has no last Java frame, while still in the `_thread_in_native` state, which leads to the GC skipping processing of any active Java frames. > > This code was originally adapted from `JavaCallWrapper::!JavaCallWrapper` - the JNI mechanism for upcalls - but in that case the frame anchor copy happens in the `_thread_in_vm` state, which means the GC will wait for the thread to get to a safepoint. > > The solution proposed here is to do the frame anchor copy in the java thread state, before transitioning back to the native state. The java thread state, like the vm thread state, is also 'safe' i.e. the GC will wait for the thread to get to a safepoint, so we can safely do our non-atomic copy of the frame anchor. > > Additionally, this PR resolves a similar issue in `on_entry`, by moving the clearing of the pending exception (in case native code use a JNI API and didn't handle the exception afterwards). We now also skip checking for async exceptions when transitioning from native to java, so we don't immediately clear them. Any async exceptions will be picked up at the next safepoint instead. > > Special thanks to @stefank and @fisk for finding the root cause, and @jaikiran for testing and debugging. > > Testing: tier 1-4, 20k runs of the failing test on linux-aarch64. This seems quite reasonable. Ensuring the correct state for things like updating the frame_anchor is critical, so I wonder if we can assert we are in a safepoint-safe state when doing so? I had to think long about the async exception deferral ... probably okay. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21742#pullrequestreview-2444413977 From aboldtch at openjdk.org Tue Nov 19 07:24:55 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 19 Nov 2024 07:24:55 GMT Subject: RFR: 8344414: ZGC: Another division by zero in rule_major_allocation_rate (ubsan) Message-ID: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> This specific issue was known since #20888. As well as a more serious issue in `calculate_extra_young_gc_time` which may introduce a `NaN`. This specific issue is sane as long as we have IEEE 754 sans the C++ standard making division by zero UB. As discussed in #21304 it is probably better to try and tackle the division by zero issue by making sure the input is never zero. This patch introduces a small offset to the average which will effectively leave the value unchanged unless it is zero, and behave as almost zero in calculations without causing actual division by zero. There is still the issue with `NaN`, this patch adds a short circuit when this can occur and returns the analytical result of the calculation. ------------- Commit messages: - 8344414: ZGC: Another division by zero in rule_major_allocation_rate (ubsan) Changes: https://git.openjdk.org/jdk/pull/22228/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22228&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344414 Stats: 27 lines in 3 files changed: 10 ins; 1 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/22228.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22228/head:pull/22228 PR: https://git.openjdk.org/jdk/pull/22228 From tschatzl at openjdk.org Tue Nov 19 07:25:50 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 19 Nov 2024 07:25:50 GMT Subject: RFR: 8344302: G1: Refactor G1CMTask::do_marking_step to use smaller wrapper methods In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 12:34:34 GMT, Ivan Walulya wrote: > Hi all, > > Please review this refactoring of G1CMTask::do_marking_step, breaking it down into multiple helper methods to improve readability. > > Testing: Tier-1 Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22147#pullrequestreview-2444472884 From aph at openjdk.org Tue Nov 19 09:45:24 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 19 Nov 2024 09:45:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b > So yes, some platforms [have alignment requirements for vectors]. But they seem to be the exception. All AArch64 implementations work with unaligned vectors ? that's in the architecture spec ? but some designs thaht were made years ago performed badly. It's not a problem with new designs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2485185002 From tschatzl at openjdk.org Tue Nov 19 09:51:17 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 19 Nov 2024 09:51:17 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 13:58:36 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. > > In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. > > The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. > > In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. > > Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. > > We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. > > Testing: Mach5 Tier1-6 > > ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) > ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) > ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) > ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) Initial set of comments. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 314: > 312: size_t bytes_to_copy = 0; > 313: double predicted_eden_time = _policy->predict_young_region_other_time_ms(eden_region_length) + > 314: _policy->predict_eden_copy_time_ms(eden_region_length, &bytes_to_copy); `bytes_to_copy` is never used afterwards, so it can be removed (and `predict...` does not need to get it passed. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 347: > 345: // without remembered sets after a few attempts to save computation costs of keeping > 346: // them candidates for very long living pinned regions. > 347: void G1CollectionSet::finalize_old_part(double time_remaining_ms) { The comment above is out of date. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 399: > 397: uint min_old_cset_length = _policy->calc_min_old_cset_length(candidates()->last_marking_candidates_length()); > 398: uint max_old_cset_length = MAX2(min_old_cset_length, _policy->calc_max_old_cset_length()); > 399: uint max_optional_regions = max_old_cset_length - min_old_cset_length; Unused. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 407: > 405: "Min %u regions, max %u regions, available %u regions" > 406: "time remaining %1.2fms, optional threshold %1.2fms", > 407: min_old_cset_length, max_old_cset_length, from_marking_groups->num_regions(), time_remaining_ms, optional_threshold_ms); Most of these debug messages are regions, it might be useful to add information about groups too. Otherwise it is impossible to understand any grouping issues. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 423: > 421: // Add regions to old set until we reach the minimum amount > 422: if (num_inital_regions < min_old_cset_length) { > 423: Suggestion: src/hotspot/share/gc/g1/g1CollectionSet.cpp line 452: > 450: > 451: predicted_initial_time_ms += predicted_time_ms; > 452: Suggestion: src/hotspot/share/gc/g1/g1CollectionSet.cpp line 467: > 465: > 466: // Remove selected groups from list of candidate groups. > 467: if (num_initial_groups > 0) { Imo it would be clearer to read if this check would be done inside `remove`; it knows if the number of groups is zero. Maybe then the `num_initial_groups` local could go away then too. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 483: > 481: "predicted initial time: %1.2fms, predicted optional time: %1.2fms, time remaining: %1.2fms", > 482: num_inital_regions, num_optional_regions, > 483: predicted_initial_time_ms, predicted_optional_time_ms, time_remaining_ms); Again, no information about group selection. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 620: > 618: log_debug(gc, ergo, cset) ("Completed with groups, selected %u", num_regions_selected); > 619: // Remove selected groups from candidate list. > 620: if (num_groups_selected > 0) { Maybe the check could be done in `remove`, removing this local. src/hotspot/share/gc/g1/g1CollectionSet.hpp line 147: > 145: uint* _collection_set_regions; > 146: volatile uint _collection_set_cur_length; > 147: uint _collection_set_max_length; Maybe there is an existing wrapper somewhere that wraps array/counter/max value somehow. To me it, whenever I read these three, I believe this is the x'th time. Maybe something for another time to factor out. src/hotspot/share/gc/g1/g1CollectionSet.hpp line 149: > 147: uint _collection_set_max_length; > 148: > 149: // Old gen groups selected for evacuation. (This is related to the big comment at the top of the file): the comment needs update. src/hotspot/share/gc/g1/g1CollectionSet.hpp line 187: > 185: > 186: void add_group_to_collection_set(G1CSetCandidateGroup* gr); > 187: Suggestion: src/hotspot/share/gc/g1/g1CollectionSet.hpp line 191: > 189: > 190: double select_candidates_from_marking(double time_remaining_ms); > 191: Suggestion: src/hotspot/share/gc/g1/g1CollectionSet.hpp line 193: > 191: > 192: void select_candidates_from_retained(double time_remaining_ms); > 193: Suggestion: src/hotspot/share/gc/g1/g1CollectionSet.hpp line 195: > 193: > 194: // Select regions for evacuation from the optional candidates given the remaining time > 195: // and return the number of actually selected regions. Suggestion: // Select regions for evacuation from the optional candidates given the remaining time // and return the number of actually selected regions. src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 56: > 54: using G1CSetCandidateGroupIterator = GrowableArrayIterator; > 55: > 56: class G1CSetCandidateGroup : public CHeapObj{ Document requirements of regions in group. src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 116: > 114: // for the first collection group to be as large as G1Policy::calc_min_old_cset_length > 115: // because we are certain that these regions have to be collected together. > 116: static const int GROUP_SIZE = 5; Please make this a diagnostic flag to allow changing this to (essentially) 1 if needed, i.e. if with a low pause time goal and low amount of worker threads one can select between accuracy of keeping pause time goal and efficiency. With the current logging we can diagnose such a problem (well, at least consider it, see my comments about logging lacking group information), but we can't give a quick solution. If the user sets a value of "0" may even mean group sizes of 1 including the first group, but that is not necessary imo (maybe as a way to keep the current test though, but that is not very important because the size of that group can be changed with `G1MixedGCCount` or similar) src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 143: > 141: void remove_selected(uint count, uint num_regions); > 142: > 143: void remove(G1CSetCandidateGroupList* other); This should add some comments about preconditions, particularly that the regions in the `other` list must be sorted the same way as this list (the original method indicates that, and I do not see a change that allows this now). I.e. this is not a general purpose `remove` method. src/hotspot/share/gc/g1/g1Policy.cpp line 499: > 497: } > 498: predicted_region_evac_time_ms += gr->predict_group_total_time_ms(); > 499: min_marking_candidates = min_marking_candidates > gr->length() ? (min_marking_candidates - gr->length()) : 0; Instead of this somewhat complex way of saturating subtraction, maybe it is more clear to have an extra counter summing up already added regions and compare that against the (constant) min_marking_candidates. src/hotspot/share/gc/g1/g1Policy.cpp line 1135: > 1133: > 1134: double G1Policy::predict_region_merge_scan_time(G1HeapRegion* hr, bool for_young_only_phase) const { > 1135: assert(!hr->is_young(), "Sanity Check!"); Debug code? At least change "Sanity Check" to "must be"; would be nice to at least add region index to the message too. src/hotspot/share/gc/g1/g1RemSet.cpp line 1398: > 1396: g1h->collection_set()->merge_cardsets_for_collection_groups(g1h, merge, worker_id, _num_workers); > 1397: > 1398: g1h->collection_set_iterate_increment_from(&combined, nullptr, worker_id); I think it is not necessary any more to use the `combined` closure here. It can be removed afaict, and closure instantiation scoped. src/hotspot/share/gc/g1/g1YoungCollector.cpp line 300: > 298: // back memory to the OS keep the most recent amount of memory for these regions. > 299: if (hr->is_starts_humongous()) { > 300: guarantee(!hr->rem_set()->has_group_cardset(), "double adding"); "must be" or "humongous regions should not be grouped/do not have group card sets" would be better. Generally the group card set description does not mention that humongous regions do not use the grouping; obvious in hindsight, but it would be nice to document. ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22015#pullrequestreview-2436104998 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847968444 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847970300 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847971290 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847972121 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847974605 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847975428 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847977384 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847978186 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847980203 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847983636 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847981291 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847985842 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847986016 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847986437 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847986722 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1848000765 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1842233718 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1847998050 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1848003022 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1848004407 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1848006630 PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1848008310 From ayang at openjdk.org Tue Nov 19 11:33:56 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 19 Nov 2024 11:33:56 GMT Subject: RFR: 8344302: G1: Refactor G1CMTask::do_marking_step to use smaller wrapper methods In-Reply-To: References: Message-ID: <_XmAZhrA7yL8EfgM4-3aAoDocv45aG7C2pAzuE75so0=.eeb0a910-192d-4f1c-a55d-b541247f98d2@github.com> On Fri, 15 Nov 2024 12:34:34 GMT, Ivan Walulya wrote: > Hi all, > > Please review this refactoring of G1CMTask::do_marking_step, breaking it down into multiple helper methods to improve readability. > > Testing: Tier-1 Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22147#pullrequestreview-2445172973 From shade at openjdk.org Tue Nov 19 11:37:58 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 19 Nov 2024 11:37:58 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v3] In-Reply-To: <9SXRh1N-RjMfm_G4LvQFsuF5_DvRxMCrcCXMOtsAwpM=.2b37fe99-1107-440d-a4cb-468fe415b3be@github.com> References: <9SXRh1N-RjMfm_G4LvQFsuF5_DvRxMCrcCXMOtsAwpM=.2b37fe99-1107-440d-a4cb-468fe415b3be@github.com> Message-ID: On Fri, 15 Nov 2024 21:35:03 GMT, William Kemper wrote: >> Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Prevent uncommit thread from running during GC I like it, thanks! src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 253: > 251: } > 252: > 253: heap->allow_uncommit(); This looks to be happening on every iteration, even if no GC happened. Should this `allow_uncommit()` go into the same block where `allow_commit()` is? Maybe it would be cleaner to make a `StackObj` mark object to manage this state -- up to you. src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.cpp line 189: > 187: void ShenandoahUncommitThread::allow_uncommit() { > 188: _uncommit_allowed.set(); > 189: } New line at the end of file here. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22019#pullrequestreview-2445178952 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1848181197 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1848183019 From iwalulya at openjdk.org Tue Nov 19 14:34:02 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 19 Nov 2024 14:34:02 GMT Subject: RFR: 8344302: G1: Refactor G1CMTask::do_marking_step to use smaller wrapper methods In-Reply-To: <_XmAZhrA7yL8EfgM4-3aAoDocv45aG7C2pAzuE75so0=.eeb0a910-192d-4f1c-a55d-b541247f98d2@github.com> References: <_XmAZhrA7yL8EfgM4-3aAoDocv45aG7C2pAzuE75so0=.eeb0a910-192d-4f1c-a55d-b541247f98d2@github.com> Message-ID: On Tue, 19 Nov 2024 11:30:57 GMT, Albert Mingkun Yang wrote: >> Hi all, >> >> Please review this refactoring of G1CMTask::do_marking_step, breaking it down into multiple helper methods to improve readability. >> >> Testing: Tier-1 > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk and @tschatzl for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22147#issuecomment-2485878102 From iwalulya at openjdk.org Tue Nov 19 14:34:03 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 19 Nov 2024 14:34:03 GMT Subject: Integrated: 8344302: G1: Refactor G1CMTask::do_marking_step to use smaller wrapper methods In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 12:34:34 GMT, Ivan Walulya wrote: > Hi all, > > Please review this refactoring of G1CMTask::do_marking_step, breaking it down into multiple helper methods to improve readability. > > Testing: Tier-1 This pull request has now been integrated. Changeset: 1717946c Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/1717946c1b6494a4a44622027ac1dd175fcb9563 Stats: 432 lines in 2 files changed: 223 ins; 195 del; 14 mod 8344302: G1: Refactor G1CMTask::do_marking_step to use smaller wrapper methods Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.org/jdk/pull/22147 From jvernee at openjdk.org Tue Nov 19 17:39:48 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 19 Nov 2024 17:39:48 GMT Subject: RFR: 8331735: UpcallLinker::on_exit races with GC when copying frame anchor In-Reply-To: References: <4Q4duvOf_UPZkpps6YI0SvRcYYQkG4b7ckTUUTVB494=.9fa7eaec-73fc-43c8-9f69-133630235b2b@github.com> Message-ID: <6tN8BgGqulZRILwuuI-y6oX_1l1I6DfYmePj_EzoC4A=.fed13363-9eb0-44a8-a1ce-dce4bd56fe17@github.com> On Tue, 19 Nov 2024 06:47:17 GMT, David Holmes wrote: > I wonder if we can assert we are in a safepoint-safe state when doing so? I think we can do this. I've prototyped this here: https://github.com/openjdk/jdk/compare/pr/21742...JornVernee:jdk:SafeFrameAnchor+assert This catches the issue fixed by this patch, and it passes at least tier 1. We'd need something similar in assembly where we touch the frame anchor, is `MacroAssembler::set_last_Java_frame` and `MacroAssembler::reset_last_Java_frame`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21742#issuecomment-2486347485 From wkemper at openjdk.org Tue Nov 19 19:26:06 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 19 Nov 2024 19:26:06 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v3] In-Reply-To: References: <9SXRh1N-RjMfm_G4LvQFsuF5_DvRxMCrcCXMOtsAwpM=.2b37fe99-1107-440d-a4cb-468fe415b3be@github.com> Message-ID: On Tue, 19 Nov 2024 11:33:51 GMT, Aleksey Shipilev wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Prevent uncommit thread from running during GC > > src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 253: > >> 251: } >> 252: >> 253: heap->allow_uncommit(); > > This looks to be happening on every iteration, even if no GC happened. Should this `allow_uncommit()` go into the same block where `allow_commit()` is? Maybe it would be cleaner to make a `StackObj` mark object to manage this state -- up to you. Hmm, good catch. I think it's a little worse than this even. The code that is meant to trigger an uncommit: if (ShenandoahUncommit) { if (heap->check_soft_max_changed()) { heap->notify_soft_max_changed(); } else if (is_gc_requested) { heap->notify_explicit_gc_requested(); } } Will only happen when uncommit is forbidden. The notification will fall on deaf ears, as it were. I'll fix this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1848942547 From wkemper at openjdk.org Tue Nov 19 19:44:36 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 19 Nov 2024 19:44:36 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v4] In-Reply-To: References: Message-ID: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Improve comments - Do not notify uncommit thread when uncommit is forbidden ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22019/files - new: https://git.openjdk.org/jdk/pull/22019/files/997360ac..05db9558 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=02-03 Stats: 69 lines in 4 files changed: 58 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/22019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22019/head:pull/22019 PR: https://git.openjdk.org/jdk/pull/22019 From wkemper at openjdk.org Tue Nov 19 20:03:52 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 19 Nov 2024 20:03:52 GMT Subject: RFR: 8342041: Test gc/shenandoah/oom/TestClassLoaderLeak.java slow on Windows after JDK-8340490 [v3] In-Reply-To: References: Message-ID: <5nf28UIt0ZW_9Df6LutJLLEaLUoBsJEZJhbpn2usRJA=.61835f96-5b3f-43ff-9a4b-d7446a9f3146@github.com> On Sat, 16 Nov 2024 19:49:20 GMT, Xiaolong Peng wrote: >> Fixing the regression on Windows caused by JDK-8340490, the bug is actually caused by different behavior in `os:: os::elapsed_counter()` which I wasn't aware of. Windows doesn't have nanosecond hi-res clock support, so instead of nanoseconds it returns the the current value of the performance counter([link](https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter)). >> >> ### Tests >> - [x] Verify gc/shenandoah/oom/TestClassLoaderLeak.java on Windows, no regression at all >> - [x] Run test suites hotspot_gc_shenandoah with linux-aarch64-server-fastdebug > > Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: > > - Revert all the changes not related to the bug fix > - Simplify pace_for_alloc LGTM ------------- Marked as reviewed by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/22172#pullrequestreview-2446541678 From xpeng at openjdk.org Tue Nov 19 20:12:49 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 19 Nov 2024 20:12:49 GMT Subject: RFR: 8342041: Test gc/shenandoah/oom/TestClassLoaderLeak.java slow on Windows after JDK-8340490 [v3] In-Reply-To: References: Message-ID: <5FLMEa2WpCyG2gJfES1wcimSOjcbpSUwZUObT_GWyh0=.7ce51686-b4f7-414e-85e4-45a51f0ead62@github.com> On Sat, 16 Nov 2024 19:49:20 GMT, Xiaolong Peng wrote: >> Fixing the regression on Windows caused by JDK-8340490, the bug is actually caused by different behavior in `os:: os::elapsed_counter()` which I wasn't aware of. Windows doesn't have nanosecond hi-res clock support, so instead of nanoseconds it returns the the current value of the performance counter([link](https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter)). >> >> ### Tests >> - [x] Verify gc/shenandoah/oom/TestClassLoaderLeak.java on Windows, no regression at all >> - [x] Run test suites hotspot_gc_shenandoah with linux-aarch64-server-fastdebug > > Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: > > - Revert all the changes not related to the bug fix > - Simplify pace_for_alloc Thank you all for the reviews, I'll start integration for the PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22172#issuecomment-2486652926 From duke at openjdk.org Tue Nov 19 20:12:49 2024 From: duke at openjdk.org (duke) Date: Tue, 19 Nov 2024 20:12:49 GMT Subject: RFR: 8342041: Test gc/shenandoah/oom/TestClassLoaderLeak.java slow on Windows after JDK-8340490 [v3] In-Reply-To: References: Message-ID: <-w_0ZvyS9ZlGmjuwDmrkgsO3lefaTDbq2FQoT81BLb4=.11c51082-f96e-490e-9ec6-f9f730523132@github.com> On Sat, 16 Nov 2024 19:49:20 GMT, Xiaolong Peng wrote: >> Fixing the regression on Windows caused by JDK-8340490, the bug is actually caused by different behavior in `os:: os::elapsed_counter()` which I wasn't aware of. Windows doesn't have nanosecond hi-res clock support, so instead of nanoseconds it returns the the current value of the performance counter([link](https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter)). >> >> ### Tests >> - [x] Verify gc/shenandoah/oom/TestClassLoaderLeak.java on Windows, no regression at all >> - [x] Run test suites hotspot_gc_shenandoah with linux-aarch64-server-fastdebug > > Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: > > - Revert all the changes not related to the bug fix > - Simplify pace_for_alloc @pengxiaolong Your change (at version 19e782c6189e3df68c9a3ebfc6bd8ee5934ee67e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22172#issuecomment-2486653685 From xpeng at openjdk.org Tue Nov 19 20:21:52 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 19 Nov 2024 20:21:52 GMT Subject: Integrated: 8342041: Test gc/shenandoah/oom/TestClassLoaderLeak.java slow on Windows after JDK-8340490 In-Reply-To: References: Message-ID: On Sat, 16 Nov 2024 00:11:39 GMT, Xiaolong Peng wrote: > Fixing the regression on Windows caused by JDK-8340490, the bug is actually caused by different behavior in `os:: os::elapsed_counter()` which I wasn't aware of. Windows doesn't have nanosecond hi-res clock support, so instead of nanoseconds it returns the the current value of the performance counter([link](https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter)). > > ### Tests > - [x] Verify gc/shenandoah/oom/TestClassLoaderLeak.java on Windows, no regression at all > - [x] Run test suites hotspot_gc_shenandoah with linux-aarch64-server-fastdebug This pull request has now been integrated. Changeset: cd45ba32 Author: Xiaolong Peng Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/cd45ba32f026ba3827d18836cab37a73f59346ed Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod 8342041: Test gc/shenandoah/oom/TestClassLoaderLeak.java slow on Windows after JDK-8340490 Reviewed-by: shade, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/22172 From shade at openjdk.org Tue Nov 19 20:26:56 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 19 Nov 2024 20:26:56 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v4] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 19:44:36 GMT, William Kemper wrote: >> Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. > > William Kemper has updated the pull request incrementally with two additional commits since the last revision: > > - Improve comments > - Do not notify uncommit thread when uncommit is forbidden I'll approve again, with the following nits: src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 45: > 43: // Regions cannot be uncommitted when concurrent reset is zeroing out the bitmaps. > 44: // This CADR class enforces this by forbidding region uncommits while it is in scope. > 45: struct ShenandoahForbidRegionUncommit : public StackObj { This is `class`, not `struct`, right? A common name for these in Hotspot are `*No*Mark`, so `ShenandoahNoUncommitMark`? ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22019#pullrequestreview-2446555609 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1848995584 From shade at openjdk.org Tue Nov 19 20:26:58 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 19 Nov 2024 20:26:58 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v4] In-Reply-To: References: Message-ID: <2ctRsJ6JhReYC9yjcPV4eRljht3bMAZ9B3urifpvQXQ=.470aa336-f334-4345-863b-557250aa3416@github.com> On Tue, 12 Nov 2024 19:22:58 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1932: >> >>> 1930: if (_uncommit_thread != nullptr) { >>> 1931: _uncommit_thread->stop(); >>> 1932: } >> >> Are there limits on proper sequencing here? Can we shutdown uncommit thread before cancelling the GC and waiting for control thread to exit? This would save end-to-end time for short commands, as we would hide the uncommit thread shutdown in the shadow of control thread exiting. > > I'm not sure the order matters here. `ConcurrentGCThread::stop` waits until the target thread sets `_has_terminated`. OK, nevermind. We can fix it later if it becomes a problem. >> src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.cpp line 139: >> >>> 137: _heap->notify_heap_changed(); >>> 138: double elapsed = os::elapsedTime() - start; >>> 139: log_info(gc)("Uncommitted " SIZE_FORMAT " regions, in %.3fs", count, elapsed); >> >> If we can, can we match the current log format? E.g. print `Concurrent uncommit`, with appropriately formatted timestamp? I think we also want `log_info(gc,start)` at the beginning of the method. I think `ShenandoahConcurrentPhase` helper did all that, can we still use it? > > We can restore the log messages, but I don't think `ShenandoahConcurrentPhase` and friends will like being used outside of a cycle. I'll look into it. Yeah, at least restore the log format and add `gc+start` log as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1849006729 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1849011585 From wkemper at openjdk.org Tue Nov 19 22:14:10 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 19 Nov 2024 22:14:10 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v5] In-Reply-To: References: Message-ID: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Use idiomatic name for CADR class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22019/files - new: https://git.openjdk.org/jdk/pull/22019/files/05db9558..7007415a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=03-04 Stats: 5 lines in 1 file changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/22019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22019/head:pull/22019 PR: https://git.openjdk.org/jdk/pull/22019 From wkemper at openjdk.org Wed Nov 20 01:44:28 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 20 Nov 2024 01:44:28 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v6] In-Reply-To: References: Message-ID: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Allow commits initially ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22019/files - new: https://git.openjdk.org/jdk/pull/22019/files/7007415a..a00945c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=04-05 Stats: 9 lines in 1 file changed: 5 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22019/head:pull/22019 PR: https://git.openjdk.org/jdk/pull/22019 From shade at openjdk.org Wed Nov 20 09:33:17 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 20 Nov 2024 09:33:17 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v6] In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 01:44:28 GMT, William Kemper wrote: >> Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Allow commits initially Almost there, modulo restoring the logging. src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 267: > 265: last_sleep_adjust_time = current; > 266: } > 267: Nit: No need for this newline, the sleep logically relates to this block. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22019#pullrequestreview-2448019848 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1849943600 From shade at openjdk.org Wed Nov 20 09:33:18 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 20 Nov 2024 09:33:18 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v6] In-Reply-To: <2ctRsJ6JhReYC9yjcPV4eRljht3bMAZ9B3urifpvQXQ=.470aa336-f334-4345-863b-557250aa3416@github.com> References: <2ctRsJ6JhReYC9yjcPV4eRljht3bMAZ9B3urifpvQXQ=.470aa336-f334-4345-863b-557250aa3416@github.com> Message-ID: On Tue, 19 Nov 2024 20:23:57 GMT, Aleksey Shipilev wrote: >> We can restore the log messages, but I don't think `ShenandoahConcurrentPhase` and friends will like being used outside of a cycle. I'll look into it. > > Yeah, at least restore the log format and add `gc+start` log as well. This one is still not addressed, unfortunately ^ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1849936299 From iwalulya at openjdk.org Wed Nov 20 19:23:34 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 20 Nov 2024 19:23:34 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v2] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. > > In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. > > The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. > > In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. > > Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. > > We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. > > Testing: Mach5 Tier1-6 > > ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) > ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) > ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) > ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Thomas Review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22015/files - new: https://git.openjdk.org/jdk/pull/22015/files/9447a455..4aa4d6b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=00-01 Stats: 77 lines in 7 files changed: 24 ins; 28 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/22015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22015/head:pull/22015 PR: https://git.openjdk.org/jdk/pull/22015 From wkemper at openjdk.org Wed Nov 20 20:31:29 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 20 Nov 2024 20:31:29 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v7] In-Reply-To: References: Message-ID: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Restore logging format, show change in committed heap, rather than usage ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22019/files - new: https://git.openjdk.org/jdk/pull/22019/files/a00945c2..2cb71140 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=05-06 Stats: 14 lines in 2 files changed: 10 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22019/head:pull/22019 PR: https://git.openjdk.org/jdk/pull/22019 From wkemper at openjdk.org Wed Nov 20 20:31:29 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 20 Nov 2024 20:31:29 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v7] In-Reply-To: References: <2ctRsJ6JhReYC9yjcPV4eRljht3bMAZ9B3urifpvQXQ=.470aa336-f334-4345-863b-557250aa3416@github.com> Message-ID: On Wed, 20 Nov 2024 09:26:19 GMT, Aleksey Shipilev wrote: >> Yeah, at least restore the log format and add `gc+start` log as well. > > This one is still not addressed, unfortunately ^ Yes, I spent some time trying to resurrect `ShenandoahConcurrentPhase` for uncommit here, but it really doesn't want to be used outside of a gc cycle. Also, previously it was logging heap _usage_, which isn't quite what we want here (this may actually increase during this phase, which makes it seem as though nothing is being uncommitted). I've restored the original logging format, but instead of logging heap usage it is now logging heap committed before and after. Here is an excerpt from specjbb2015 with `-Xms5g -Xmx10g`: [2024-11-20T20:02:25.056+0000][97.396s][22293][info][gc,start ] Concurrent uncommit [2024-11-20T20:02:25.072+0000][97.412s][22293][info][gc ] Concurrent uncommit 5424M->5120M(5120M) 15.988ms [2024-11-20T20:05:17.916+0000][270.255s][22293][info][gc,start ] Concurrent uncommit [2024-11-20T20:05:18.169+0000][270.508s][22293][info][gc ] Concurrent uncommit 10240M->5120M(5120M) 253.048ms [2024-11-20T20:06:45.329+0000][357.668s][22293][info][gc,start ] Concurrent uncommit [2024-11-20T20:06:45.596+0000][357.935s][22293][info][gc ] Concurrent uncommit 10240M->5120M(5120M) 267.144ms [2024-11-20T20:06:57.147+0000][369.486s][22293][info][gc,start ] Concurrent uncommit [2024-11-20T20:06:57.148+0000][369.487s][22293][info][gc ] Concurrent uncommit 5456M->5440M(5440M) 1.189ms ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1850939713 From jvernee at openjdk.org Wed Nov 20 22:53:15 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 20 Nov 2024 22:53:15 GMT Subject: RFR: 8331735: UpcallLinker::on_exit races with GC when copying frame anchor In-Reply-To: <4Q4duvOf_UPZkpps6YI0SvRcYYQkG4b7ckTUUTVB494=.9fa7eaec-73fc-43c8-9f69-133630235b2b@github.com> References: <4Q4duvOf_UPZkpps6YI0SvRcYYQkG4b7ckTUUTVB494=.9fa7eaec-73fc-43c8-9f69-133630235b2b@github.com> Message-ID: On Mon, 28 Oct 2024 13:53:58 GMT, Jorn Vernee wrote: > There is a subtle race in `UpcallLinker::on_exit` between copying of the old frame anchor back into place, and the GC. Since this copy is not atomic, it may briefly appear as if a thread has no last Java frame, while still in the `_thread_in_native` state, which leads to the GC skipping processing of any active Java frames. > > This code was originally adapted from `JavaCallWrapper::!JavaCallWrapper` - the JNI mechanism for upcalls - but in that case the frame anchor copy happens in the `_thread_in_vm` state, which means the GC will wait for the thread to get to a safepoint. > > The solution proposed here is to do the frame anchor copy in the java thread state, before transitioning back to the native state. The java thread state, like the vm thread state, is also 'safe' i.e. the GC will wait for the thread to get to a safepoint, so we can safely do our non-atomic copy of the frame anchor. > > Additionally, this PR resolves a similar issue in `on_entry`, by moving the clearing of the pending exception (in case native code use a JNI API and didn't handle the exception afterwards). We now also skip checking for async exceptions when transitioning from native to java, so we don't immediately clear them. Any async exceptions will be picked up at the next safepoint instead. > > Special thanks to @stefank and @fisk for finding the root cause, and @jaikiran for testing and debugging. > > Testing: tier 1-4, 20k runs of the failing test on linux-aarch64. > > I wonder if we can assert we are in a safepoint-safe state when doing so? > > I think we can do this. I've prototyped this here: [pr/21742...JornVernee:jdk:SafeFrameAnchor+assert](https://github.com/openjdk/jdk/compare/pr/21742...JornVernee:jdk:SafeFrameAnchor+assert) > > This catches the issue fixed by this patch, and it passes at least tier 1. We'd need something similar in assembly where we touch the frame anchor, is `MacroAssembler::set_last_Java_frame` and `MacroAssembler::reset_last_Java_frame`. Thinking some more about this: there might be other instances of `JavaFrameAnchor` that are fine to touch when the thread is in the native state. It's just the frame anchor inside a `JavaThread` that can not be touched if that thread is in a certain state. It might be possible to encapsulate the `JavaFrameAnchor` instance inside the thread, and then guard any accesses to it. But, that seems like a much more invasive change, so I'll hold off on that and focus this PR on fixing the issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21742#issuecomment-2489692928 From kbarrett at openjdk.org Thu Nov 21 00:02:16 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 21 Nov 2024 00:02:16 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 23:40:41 GMT, Kim Barrett wrote: > This change splits the existing PartialArrayStateAllocator class into an > allocator class and a manager class. The allocator class is per worker > thread. The manager class provides the memory management context for a > group of allocators. > > This change is in preparation for some other refactorings around partial array > state handling. That work is intended to make it easier for various > collections to adopt the use of that mechanism for chunking the processing of > large objArrays. > > The new implementation for the memory management context is based on the > existing one, with an Arena per worker, now controlled by the manager object. > Potential improvements to that can be explored in the future. Some ideas > include improvements to the Arena API or a single thread-safe Arena variant > (trading slower arena allocation (which is the slow path) for less memory > usage). > > G1 has a single manager, reused by each young/mixed GC. Associated state > allocators are nested in the per-worker structures, so deleted at the end of > the collection. The manager is reset at the end of the collection to allow the > memory to be recycled. It is planned that the STW full collector will also use > this manager when converted to use PartialArrayState. So it will be reused by > all STW collections. > > ParallelGC has a single manager, reused by each young collection. Because the > per-worker promotion managers are never destroyed, their nested state > allocators are never destroyed. So the manager is not reset, instead leaving > previously allocated states in the allocator free lists for use by the next > collection. This means the full collector won't be able to use the same > manager object as the young collectors. > > Testing: mach5 tier1-5 I'm not sure why skara initially labelled this as `hotspot`. Re-labelled as `hotspot-gc`. [later] Oh, it was because of the jfr test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22287#issuecomment-2489782134 From shade at openjdk.org Thu Nov 21 08:20:17 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 21 Nov 2024 08:20:17 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v7] In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 20:31:29 GMT, William Kemper wrote: >> Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Restore logging format, show change in committed heap, rather than usage I think log message is still confusing a bit... ------------- Changes requested by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22019#pullrequestreview-2450572312 From shade at openjdk.org Thu Nov 21 08:20:18 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 21 Nov 2024 08:20:18 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v7] In-Reply-To: References: <2ctRsJ6JhReYC9yjcPV4eRljht3bMAZ9B3urifpvQXQ=.470aa336-f334-4345-863b-557250aa3416@github.com> Message-ID: <9MFkD3YNuSJIBPOsHYEKFX98XkMGKgMYa_7j-7usxC0=.0a99aabd-6272-40db-a5aa-6022023c1e4f@github.com> On Wed, 20 Nov 2024 20:26:15 GMT, William Kemper wrote: >> This one is still not addressed, unfortunately ^ > > Yes, I spent some time trying to resurrect `ShenandoahConcurrentPhase` for uncommit here, but it really doesn't want to be used outside of a gc cycle. Also, previously it was logging heap _usage_, which isn't quite what we want here (this may actually increase during this phase, which makes it seem as though nothing is being uncommitted). > > I've restored the original logging format, but instead of logging heap usage it is now logging heap committed before and after. Here is an excerpt from specjbb2015 with `-Xms5g -Xmx10g`: > > > [2024-11-20T20:02:25.056+0000][97.396s][22293][info][gc,start ] Concurrent uncommit > [2024-11-20T20:02:25.072+0000][97.412s][22293][info][gc ] Concurrent uncommit 5424M->5120M(5120M) 15.988ms > [2024-11-20T20:05:17.916+0000][270.255s][22293][info][gc,start ] Concurrent uncommit > [2024-11-20T20:05:18.169+0000][270.508s][22293][info][gc ] Concurrent uncommit 10240M->5120M(5120M) 253.048ms > [2024-11-20T20:06:45.329+0000][357.668s][22293][info][gc,start ] Concurrent uncommit > [2024-11-20T20:06:45.596+0000][357.935s][22293][info][gc ] Concurrent uncommit 10240M->5120M(5120M) 267.144ms > [2024-11-20T20:06:57.147+0000][369.486s][22293][info][gc,start ] Concurrent uncommit > [2024-11-20T20:06:57.148+0000][369.487s][22293][info][gc ] Concurrent uncommit 5456M->5440M(5440M) 1.189ms If we are emitting a log line that looks like a properly formatted GC log line, but the numbers there mean something else for `Concurrent uncommit`, we are bound to confuse users and automatic tools. Uncommit should affect `capacity`, this is how we know how deep we have uncommitted. So, I suggest we emit: Concurrent uncommit XXXXM->XXXXM (YYYYM) z.zzzms ...where `XXXX` is the heap used at the end of uncommit (note before and after are the same) and YYYY is capacity. This will not expose users to thinking uncommit grows the heap usage, and would give us instantaneous view on heap usage and capacity. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1851556763 From wkemper at openjdk.org Thu Nov 21 22:17:23 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 21 Nov 2024 22:17:23 GMT Subject: RFR: 8344798: Shenandoah: Use more descriptive variable names in shPhaseTimings.cpp Message-ID: The single letter variable names make some of this code harder to read. ------------- Commit messages: - Use more descriptive variable names Changes: https://git.openjdk.org/jdk/pull/22310/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22310&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344798 Stats: 14 lines in 1 file changed: 0 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/22310.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22310/head:pull/22310 PR: https://git.openjdk.org/jdk/pull/22310 From ysr at openjdk.org Thu Nov 21 23:24:19 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 21 Nov 2024 23:24:19 GMT Subject: RFR: 8344798: Shenandoah: Use more descriptive variable names in shPhaseTimings.cpp In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 22:12:40 GMT, William Kemper wrote: > The single letter variable names make some of this code harder to read. Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22310#pullrequestreview-2452995376 From wkemper at openjdk.org Thu Nov 21 23:28:19 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 21 Nov 2024 23:28:19 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v7] In-Reply-To: <9MFkD3YNuSJIBPOsHYEKFX98XkMGKgMYa_7j-7usxC0=.0a99aabd-6272-40db-a5aa-6022023c1e4f@github.com> References: <2ctRsJ6JhReYC9yjcPV4eRljht3bMAZ9B3urifpvQXQ=.470aa336-f334-4345-863b-557250aa3416@github.com> <9MFkD3YNuSJIBPOsHYEKFX98XkMGKgMYa_7j-7usxC0=.0a99aabd-6272-40db-a5aa-6022023c1e4f@github.com> Message-ID: On Thu, 21 Nov 2024 08:17:02 GMT, Aleksey Shipilev wrote: >> Yes, I spent some time trying to resurrect `ShenandoahConcurrentPhase` for uncommit here, but it really doesn't want to be used outside of a gc cycle. Also, previously it was logging heap _usage_, which isn't quite what we want here (this may actually increase during this phase, which makes it seem as though nothing is being uncommitted). >> >> I've restored the original logging format, but instead of logging heap usage it is now logging heap committed before and after. Here is an excerpt from specjbb2015 with `-Xms5g -Xmx10g`: >> >> >> [2024-11-20T20:02:25.056+0000][97.396s][22293][info][gc,start ] Concurrent uncommit >> [2024-11-20T20:02:25.072+0000][97.412s][22293][info][gc ] Concurrent uncommit 5424M->5120M(5120M) 15.988ms >> [2024-11-20T20:05:17.916+0000][270.255s][22293][info][gc,start ] Concurrent uncommit >> [2024-11-20T20:05:18.169+0000][270.508s][22293][info][gc ] Concurrent uncommit 10240M->5120M(5120M) 253.048ms >> [2024-11-20T20:06:45.329+0000][357.668s][22293][info][gc,start ] Concurrent uncommit >> [2024-11-20T20:06:45.596+0000][357.935s][22293][info][gc ] Concurrent uncommit 10240M->5120M(5120M) 267.144ms >> [2024-11-20T20:06:57.147+0000][369.486s][22293][info][gc,start ] Concurrent uncommit >> [2024-11-20T20:06:57.148+0000][369.487s][22293][info][gc ] Concurrent uncommit 5456M->5440M(5440M) 1.189ms > > If we are emitting a log line that looks like a properly formatted GC log line, but the numbers there mean something else for `Concurrent uncommit`, we are bound to confuse users and automatic tools. Uncommit should affect `capacity`, this is how we know how deep we have uncommitted. So, I suggest we emit: > > > Concurrent uncommit XXXXM->XXXXM (YYYYM) z.zzzms > > > ...where `XXXX` is the heap used at the end of uncommit (note before and after are the same) and YYYY is capacity. This will not expose users to thinking uncommit grows the heap usage, and would give us instantaneous view on heap usage and capacity. Hmm, the numbers are preceded by `Concurrent uncommit`, with that context it's not much of a stretch to think these numbers represent the change in _committed_ memory. The original log message (in which heap usage may increase during uncommit) was _not_ helpful. A message with the same format in which heap usage also appears to _not change at all_ during an uncommit is also perplexing. Are we trying too hard to preserve the original, not useful message? Maybe we just want a new message that plainly says: Concurrently uncommitted XXXXM in z.zzzms or Concurrent uncommit: time z.zzzms, committed before XXXXM, committed after YYYYM, capacity ZZZZM ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1853077616 From wkemper at openjdk.org Fri Nov 22 00:08:23 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 22 Nov 2024 00:08:23 GMT Subject: Integrated: 8344798: Shenandoah: Use more descriptive variable names in shPhaseTimings.cpp In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 22:12:40 GMT, William Kemper wrote: > The single letter variable names make some of this code harder to read. This pull request has now been integrated. Changeset: db44e97c Author: William Kemper URL: https://git.openjdk.org/jdk/commit/db44e97c5dfd286a58985be9b091fd43f5ad03be Stats: 14 lines in 1 file changed: 0 ins; 0 del; 14 mod 8344798: Shenandoah: Use more descriptive variable names in shPhaseTimings.cpp Reviewed-by: ysr ------------- PR: https://git.openjdk.org/jdk/pull/22310 From ayang at openjdk.org Fri Nov 22 10:44:31 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 22 Nov 2024 10:44:31 GMT Subject: RFR: 8344853: Parallel: Improve comments in psParallelCompact Message-ID: Trivial revising some existing comments and adding some comments. No real code change other than a renaming. ------------- Commit messages: - trivial Changes: https://git.openjdk.org/jdk/pull/22318/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22318&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344853 Stats: 17 lines in 1 file changed: 1 ins; 1 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/22318.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22318/head:pull/22318 PR: https://git.openjdk.org/jdk/pull/22318 From kbarrett at openjdk.org Fri Nov 22 11:00:52 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 22 Nov 2024 11:00:52 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v2] In-Reply-To: References: Message-ID: <5BFfQWo76ziDd0mT3TczoRhPzPm3bGmnBbS8f5dS63M=.3a130ddd-05de-4fd2-97e0-07c6f9fd3b24@github.com> > This change splits the existing PartialArrayStateAllocator class into an > allocator class and a manager class. The allocator class is per worker > thread. The manager class provides the memory management context for a > group of allocators. > > This change is in preparation for some other refactorings around partial array > state handling. That work is intended to make it easier for various > collections to adopt the use of that mechanism for chunking the processing of > large objArrays. > > The new implementation for the memory management context is based on the > existing one, with an Arena per worker, now controlled by the manager object. > Potential improvements to that can be explored in the future. Some ideas > include improvements to the Arena API or a single thread-safe Arena variant > (trading slower arena allocation (which is the slow path) for less memory > usage). > > G1 has a single manager, reused by each young/mixed GC. Associated state > allocators are nested in the per-worker structures, so deleted at the end of > the collection. The manager is reset at the end of the collection to allow the > memory to be recycled. It is planned that the STW full collector will also use > this manager when converted to use PartialArrayState. So it will be reused by > all STW collections. > > ParallelGC has a single manager, reused by each young collection. Because the > per-worker promotion managers are never destroyed, their nested state > allocators are never destroyed. So the manager is not reset, instead leaving > previously allocated states in the allocator free lists for use by the next > collection. This means the full collector won't be able to use the same > manager object as the young collectors. > > Testing: mach5 tier1-5 Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: simplify pas allocator destruction and manager phase tracking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22287/files - new: https://git.openjdk.org/jdk/pull/22287/files/7aa54200..256b0021 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=00-01 Stats: 43 lines in 2 files changed: 7 ins; 8 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/22287.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22287/head:pull/22287 PR: https://git.openjdk.org/jdk/pull/22287 From kbarrett at openjdk.org Fri Nov 22 11:04:22 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 22 Nov 2024 11:04:22 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v2] In-Reply-To: <5BFfQWo76ziDd0mT3TczoRhPzPm3bGmnBbS8f5dS63M=.3a130ddd-05de-4fd2-97e0-07c6f9fd3b24@github.com> References: <5BFfQWo76ziDd0mT3TczoRhPzPm3bGmnBbS8f5dS63M=.3a130ddd-05de-4fd2-97e0-07c6f9fd3b24@github.com> Message-ID: On Fri, 22 Nov 2024 11:00:52 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > simplify pas allocator destruction and manager phase tracking Some offline discussion with @albertnetymk led to some changes in the manager's tracking of constructed and destructed associated allocators that I think is simpler, and makes some parts debug-only. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22287#issuecomment-2493487074 From stefank at openjdk.org Fri Nov 22 11:22:19 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 22 Nov 2024 11:22:19 GMT Subject: RFR: 8344363: FullGCForwarding::initialize_flags is called after ObjLayout::initialize [v4] In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 12:44:58 GMT, Roman Kennke wrote: >> From the bug description: >> ObjLayout::initialize() is called in Arguments::parse(const JavaVMInitArgs*) which sets ObjLayout::_klass_mode. FullGCForwarding::initialize_flags(size_t) is called in init_globals() which seems to be later in the Threads::create_vm(JavaVMInitArgs*, bool*) routine. The latter, however, can unset the UseCompactObjectHeaders flag, which leads to a potential mismatch with ObjLayout::_klass_mode, firing the asserts in ObjLayout::klass_mode(). >> >> There is also a related (and somewhat minor) problem: In Arguments::parse() we enable a bunch of stuff when UseCompactObjectHeaders is on (LW locking, but that's the default anyway, and object-monitor-tables, which are otherwise off), but then may disable UseCompactObjectHeaders, e.g. in the GC or when -UseCompressedClassPointers is requested. This then leaves the odd situation that we run without compact headers, but have object monitor tables turned on. >> >> The fix to both issues is: >> - First disable UCOH >> - and do that in apply_ergo() (e.g. GCs should do it in GCArguments::initialize()) >> - then enable any flags required by compact headers >> - Initialize ObjLayout after all flags are done (it's in the correct place, already) >> >> Testing: >> - [x] tier1 -UCOH (default) >> - [ ] tier1 +UCOH >> - [x] Manual testing several flag combinations > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Move ObjLayout::initialize() up a little Ping hotspot-gc. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22255#issuecomment-2493521567 From ayang at openjdk.org Fri Nov 22 17:22:18 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 22 Nov 2024 17:22:18 GMT Subject: RFR: 8344363: FullGCForwarding::initialize_flags is called after ObjLayout::initialize [v4] In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 12:44:58 GMT, Roman Kennke wrote: >> From the bug description: >> ObjLayout::initialize() is called in Arguments::parse(const JavaVMInitArgs*) which sets ObjLayout::_klass_mode. FullGCForwarding::initialize_flags(size_t) is called in init_globals() which seems to be later in the Threads::create_vm(JavaVMInitArgs*, bool*) routine. The latter, however, can unset the UseCompactObjectHeaders flag, which leads to a potential mismatch with ObjLayout::_klass_mode, firing the asserts in ObjLayout::klass_mode(). >> >> There is also a related (and somewhat minor) problem: In Arguments::parse() we enable a bunch of stuff when UseCompactObjectHeaders is on (LW locking, but that's the default anyway, and object-monitor-tables, which are otherwise off), but then may disable UseCompactObjectHeaders, e.g. in the GC or when -UseCompressedClassPointers is requested. This then leaves the odd situation that we run without compact headers, but have object monitor tables turned on. >> >> The fix to both issues is: >> - First disable UCOH >> - and do that in apply_ergo() (e.g. GCs should do it in GCArguments::initialize()) >> - then enable any flags required by compact headers >> - Initialize ObjLayout after all flags are done (it's in the correct place, already) >> >> Testing: >> - [x] tier1 -UCOH (default) >> - [ ] tier1 +UCOH >> - [x] Manual testing several flag combinations > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Move ObjLayout::initialize() up a little Marked as reviewed by ayang (Reviewer). src/hotspot/share/runtime/arguments.cpp line 3660: > 3658: } > 3659: > 3660: void Arguments::set_compact_headers() { Since this method does more than setting the compact-headers flag, could it be renamed to `set_compact_headers_flags`, following the convention from its neighbor `set_ergonomics_flags`? ------------- PR Review: https://git.openjdk.org/jdk/pull/22255#pullrequestreview-2455137935 PR Review Comment: https://git.openjdk.org/jdk/pull/22255#discussion_r1854326306 From rkennke at openjdk.org Fri Nov 22 18:08:32 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 22 Nov 2024 18:08:32 GMT Subject: RFR: 8344363: FullGCForwarding::initialize_flags is called after ObjLayout::initialize [v5] In-Reply-To: References: Message-ID: <4S2_JFKMsJ5XkPN5Lg_LLl0cRpAYTDiX0rqRIL3S2rM=.6fdba7be-ef33-40f8-b682-3bf83132cdb1@github.com> > From the bug description: > ObjLayout::initialize() is called in Arguments::parse(const JavaVMInitArgs*) which sets ObjLayout::_klass_mode. FullGCForwarding::initialize_flags(size_t) is called in init_globals() which seems to be later in the Threads::create_vm(JavaVMInitArgs*, bool*) routine. The latter, however, can unset the UseCompactObjectHeaders flag, which leads to a potential mismatch with ObjLayout::_klass_mode, firing the asserts in ObjLayout::klass_mode(). > > There is also a related (and somewhat minor) problem: In Arguments::parse() we enable a bunch of stuff when UseCompactObjectHeaders is on (LW locking, but that's the default anyway, and object-monitor-tables, which are otherwise off), but then may disable UseCompactObjectHeaders, e.g. in the GC or when -UseCompressedClassPointers is requested. This then leaves the odd situation that we run without compact headers, but have object monitor tables turned on. > > The fix to both issues is: > - First disable UCOH > - and do that in apply_ergo() (e.g. GCs should do it in GCArguments::initialize()) > - then enable any flags required by compact headers > - Initialize ObjLayout after all flags are done (it's in the correct place, already) > > Testing: > - [x] tier1 -UCOH (default) > - [ ] tier1 +UCOH > - [x] Manual testing several flag combinations Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Rename set_compact_headers() -> set_compact_headers_flags() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22255/files - new: https://git.openjdk.org/jdk/pull/22255/files/43c7668e..28a1fa3d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22255&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22255&range=03-04 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22255/head:pull/22255 PR: https://git.openjdk.org/jdk/pull/22255 From stefank at openjdk.org Fri Nov 22 21:08:41 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 22 Nov 2024 21:08:41 GMT Subject: RFR: 8344363: FullGCForwarding::initialize_flags is called after ObjLayout::initialize [v5] In-Reply-To: <4S2_JFKMsJ5XkPN5Lg_LLl0cRpAYTDiX0rqRIL3S2rM=.6fdba7be-ef33-40f8-b682-3bf83132cdb1@github.com> References: <4S2_JFKMsJ5XkPN5Lg_LLl0cRpAYTDiX0rqRIL3S2rM=.6fdba7be-ef33-40f8-b682-3bf83132cdb1@github.com> Message-ID: On Fri, 22 Nov 2024 18:08:32 GMT, Roman Kennke wrote: >> From the bug description: >> ObjLayout::initialize() is called in Arguments::parse(const JavaVMInitArgs*) which sets ObjLayout::_klass_mode. FullGCForwarding::initialize_flags(size_t) is called in init_globals() which seems to be later in the Threads::create_vm(JavaVMInitArgs*, bool*) routine. The latter, however, can unset the UseCompactObjectHeaders flag, which leads to a potential mismatch with ObjLayout::_klass_mode, firing the asserts in ObjLayout::klass_mode(). >> >> There is also a related (and somewhat minor) problem: In Arguments::parse() we enable a bunch of stuff when UseCompactObjectHeaders is on (LW locking, but that's the default anyway, and object-monitor-tables, which are otherwise off), but then may disable UseCompactObjectHeaders, e.g. in the GC or when -UseCompressedClassPointers is requested. This then leaves the odd situation that we run without compact headers, but have object monitor tables turned on. >> >> The fix to both issues is: >> - First disable UCOH >> - and do that in apply_ergo() (e.g. GCs should do it in GCArguments::initialize()) >> - then enable any flags required by compact headers >> - Initialize ObjLayout after all flags are done (it's in the correct place, already) >> >> Testing: >> - [x] tier1 -UCOH (default) >> - [ ] tier1 +UCOH >> - [x] Manual testing several flag combinations > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Rename set_compact_headers() -> set_compact_headers_flags() Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22255#pullrequestreview-2455645151 From sjohanss at openjdk.org Mon Nov 25 08:28:16 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Mon, 25 Nov 2024 08:28:16 GMT Subject: RFR: 8344853: Parallel: Improve comments in psParallelCompact In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 10:39:31 GMT, Albert Mingkun Yang wrote: > Trivial revising some existing comments and adding some comments. No real code change other than a renaming. Marked as reviewed by sjohanss (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22318#pullrequestreview-2457613155 From shade at openjdk.org Mon Nov 25 08:42:15 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 25 Nov 2024 08:42:15 GMT Subject: RFR: 8344363: FullGCForwarding::initialize_flags is called after ObjLayout::initialize [v5] In-Reply-To: <4S2_JFKMsJ5XkPN5Lg_LLl0cRpAYTDiX0rqRIL3S2rM=.6fdba7be-ef33-40f8-b682-3bf83132cdb1@github.com> References: <4S2_JFKMsJ5XkPN5Lg_LLl0cRpAYTDiX0rqRIL3S2rM=.6fdba7be-ef33-40f8-b682-3bf83132cdb1@github.com> Message-ID: <54Zkl4v43F0p-3jP5zqgTFPS30GeydTUkeTViZ6EBNg=.78612187-33a3-46d7-9657-cce715d01935@github.com> On Fri, 22 Nov 2024 18:08:32 GMT, Roman Kennke wrote: >> From the bug description: >> ObjLayout::initialize() is called in Arguments::parse(const JavaVMInitArgs*) which sets ObjLayout::_klass_mode. FullGCForwarding::initialize_flags(size_t) is called in init_globals() which seems to be later in the Threads::create_vm(JavaVMInitArgs*, bool*) routine. The latter, however, can unset the UseCompactObjectHeaders flag, which leads to a potential mismatch with ObjLayout::_klass_mode, firing the asserts in ObjLayout::klass_mode(). >> >> There is also a related (and somewhat minor) problem: In Arguments::parse() we enable a bunch of stuff when UseCompactObjectHeaders is on (LW locking, but that's the default anyway, and object-monitor-tables, which are otherwise off), but then may disable UseCompactObjectHeaders, e.g. in the GC or when -UseCompressedClassPointers is requested. This then leaves the odd situation that we run without compact headers, but have object monitor tables turned on. >> >> The fix to both issues is: >> - First disable UCOH >> - and do that in apply_ergo() (e.g. GCs should do it in GCArguments::initialize()) >> - then enable any flags required by compact headers >> - Initialize ObjLayout after all flags are done (it's in the correct place, already) >> >> Testing: >> - [x] tier1 -UCOH (default) >> - [ ] tier1 +UCOH >> - [x] Manual testing several flag combinations > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Rename set_compact_headers() -> set_compact_headers_flags() Still good. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22255#pullrequestreview-2457684037 From ayang at openjdk.org Mon Nov 25 12:54:16 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 25 Nov 2024 12:54:16 GMT Subject: RFR: 8344363: FullGCForwarding::initialize_flags is called after ObjLayout::initialize [v5] In-Reply-To: <4S2_JFKMsJ5XkPN5Lg_LLl0cRpAYTDiX0rqRIL3S2rM=.6fdba7be-ef33-40f8-b682-3bf83132cdb1@github.com> References: <4S2_JFKMsJ5XkPN5Lg_LLl0cRpAYTDiX0rqRIL3S2rM=.6fdba7be-ef33-40f8-b682-3bf83132cdb1@github.com> Message-ID: On Fri, 22 Nov 2024 18:08:32 GMT, Roman Kennke wrote: >> From the bug description: >> ObjLayout::initialize() is called in Arguments::parse(const JavaVMInitArgs*) which sets ObjLayout::_klass_mode. FullGCForwarding::initialize_flags(size_t) is called in init_globals() which seems to be later in the Threads::create_vm(JavaVMInitArgs*, bool*) routine. The latter, however, can unset the UseCompactObjectHeaders flag, which leads to a potential mismatch with ObjLayout::_klass_mode, firing the asserts in ObjLayout::klass_mode(). >> >> There is also a related (and somewhat minor) problem: In Arguments::parse() we enable a bunch of stuff when UseCompactObjectHeaders is on (LW locking, but that's the default anyway, and object-monitor-tables, which are otherwise off), but then may disable UseCompactObjectHeaders, e.g. in the GC or when -UseCompressedClassPointers is requested. This then leaves the odd situation that we run without compact headers, but have object monitor tables turned on. >> >> The fix to both issues is: >> - First disable UCOH >> - and do that in apply_ergo() (e.g. GCs should do it in GCArguments::initialize()) >> - then enable any flags required by compact headers >> - Initialize ObjLayout after all flags are done (it's in the correct place, already) >> >> Testing: >> - [x] tier1 -UCOH (default) >> - [ ] tier1 +UCOH >> - [x] Manual testing several flag combinations > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Rename set_compact_headers() -> set_compact_headers_flags() Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22255#pullrequestreview-2458352531 From rkennke at openjdk.org Mon Nov 25 13:50:27 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 25 Nov 2024 13:50:27 GMT Subject: RFR: 8344363: FullGCForwarding::initialize_flags is called after ObjLayout::initialize [v5] In-Reply-To: <4S2_JFKMsJ5XkPN5Lg_LLl0cRpAYTDiX0rqRIL3S2rM=.6fdba7be-ef33-40f8-b682-3bf83132cdb1@github.com> References: <4S2_JFKMsJ5XkPN5Lg_LLl0cRpAYTDiX0rqRIL3S2rM=.6fdba7be-ef33-40f8-b682-3bf83132cdb1@github.com> Message-ID: <5xykYi1ZaYCjlUi1BzV8HlFD7byikWpvDVx_Xnwwkl4=.74728d39-c273-41f9-9e79-589b5de06c85@github.com> On Fri, 22 Nov 2024 18:08:32 GMT, Roman Kennke wrote: >> From the bug description: >> ObjLayout::initialize() is called in Arguments::parse(const JavaVMInitArgs*) which sets ObjLayout::_klass_mode. FullGCForwarding::initialize_flags(size_t) is called in init_globals() which seems to be later in the Threads::create_vm(JavaVMInitArgs*, bool*) routine. The latter, however, can unset the UseCompactObjectHeaders flag, which leads to a potential mismatch with ObjLayout::_klass_mode, firing the asserts in ObjLayout::klass_mode(). >> >> There is also a related (and somewhat minor) problem: In Arguments::parse() we enable a bunch of stuff when UseCompactObjectHeaders is on (LW locking, but that's the default anyway, and object-monitor-tables, which are otherwise off), but then may disable UseCompactObjectHeaders, e.g. in the GC or when -UseCompressedClassPointers is requested. This then leaves the odd situation that we run without compact headers, but have object monitor tables turned on. >> >> The fix to both issues is: >> - First disable UCOH >> - and do that in apply_ergo() (e.g. GCs should do it in GCArguments::initialize()) >> - then enable any flags required by compact headers >> - Initialize ObjLayout after all flags are done (it's in the correct place, already) >> >> Testing: >> - [x] tier1 -UCOH (default) >> - [ ] tier1 +UCOH >> - [x] Manual testing several flag combinations > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Rename set_compact_headers() -> set_compact_headers_flags() Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22255#issuecomment-2498068162 From rkennke at openjdk.org Mon Nov 25 13:50:28 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 25 Nov 2024 13:50:28 GMT Subject: Integrated: 8344363: FullGCForwarding::initialize_flags is called after ObjLayout::initialize In-Reply-To: References: Message-ID: <-z5Ym7oqEjeKU3m-UNr6Rk1SS8fzWGGAap0Ycw0fEnA=.2d74a477-535a-4175-ab49-eb881a39f89c@github.com> On Tue, 19 Nov 2024 20:07:38 GMT, Roman Kennke wrote: > From the bug description: > ObjLayout::initialize() is called in Arguments::parse(const JavaVMInitArgs*) which sets ObjLayout::_klass_mode. FullGCForwarding::initialize_flags(size_t) is called in init_globals() which seems to be later in the Threads::create_vm(JavaVMInitArgs*, bool*) routine. The latter, however, can unset the UseCompactObjectHeaders flag, which leads to a potential mismatch with ObjLayout::_klass_mode, firing the asserts in ObjLayout::klass_mode(). > > There is also a related (and somewhat minor) problem: In Arguments::parse() we enable a bunch of stuff when UseCompactObjectHeaders is on (LW locking, but that's the default anyway, and object-monitor-tables, which are otherwise off), but then may disable UseCompactObjectHeaders, e.g. in the GC or when -UseCompressedClassPointers is requested. This then leaves the odd situation that we run without compact headers, but have object monitor tables turned on. > > The fix to both issues is: > - First disable UCOH > - and do that in apply_ergo() (e.g. GCs should do it in GCArguments::initialize()) > - then enable any flags required by compact headers > - Initialize ObjLayout after all flags are done (it's in the correct place, already) > > Testing: > - [x] tier1 -UCOH (default) > - [ ] tier1 +UCOH > - [x] Manual testing several flag combinations This pull request has now been integrated. Changeset: cb1c7366 Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/cb1c73663e91e632d643c23e6c5acc1c5118ac8b Stats: 43 lines in 11 files changed: 18 ins; 20 del; 5 mod 8344363: FullGCForwarding::initialize_flags is called after ObjLayout::initialize Reviewed-by: stefank, shade, ayang ------------- PR: https://git.openjdk.org/jdk/pull/22255 From zgu at openjdk.org Mon Nov 25 14:38:14 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 25 Nov 2024 14:38:14 GMT Subject: RFR: 8344853: Parallel: Improve comments in psParallelCompact In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 10:39:31 GMT, Albert Mingkun Yang wrote: > Trivial revising some existing comments and adding some comments. No real code change other than a renaming. LGTM ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22318#pullrequestreview-2458619392 From shade at openjdk.org Mon Nov 25 17:31:17 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 25 Nov 2024 17:31:17 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v7] In-Reply-To: References: <2ctRsJ6JhReYC9yjcPV4eRljht3bMAZ9B3urifpvQXQ=.470aa336-f334-4345-863b-557250aa3416@github.com> <9MFkD3YNuSJIBPOsHYEKFX98XkMGKgMYa_7j-7usxC0=.0a99aabd-6272-40db-a5aa-6022023c1e4f@github.com> Message-ID: <9HyWt8dTl_6N6S0YhLw22cBJz8PMP1SVGXgB_ELdYF8=.4f59fd3a-fa0a-4fcf-84c8-30616b463a0c@github.com> On Thu, 21 Nov 2024 23:26:05 GMT, William Kemper wrote: >> If we are emitting a log line that looks like a properly formatted GC log line, but the numbers there mean something else for `Concurrent uncommit`, we are bound to confuse users and automatic tools. Uncommit should affect `capacity`, this is how we know how deep we have uncommitted. So, I suggest we emit: >> >> >> Concurrent uncommit XXXXM->XXXXM (YYYYM) z.zzzms >> >> >> ...where `XXXX` is the heap used at the end of uncommit (note before and after are the same) and YYYY is capacity. This will not expose users to thinking uncommit grows the heap usage, and would give us instantaneous view on heap usage and capacity. > > Hmm, the numbers are preceded by `Concurrent uncommit`, with that context it's not much of a stretch to think these numbers represent the change in _committed_ memory. The original log message (in which heap usage may increase during uncommit) was _not_ helpful. A message with the same format in which heap usage also appears to _not change at all_ during an uncommit is also perplexing. Are we trying too hard to preserve the original, not useful message? Maybe we just want a new message that plainly says: > > > Concurrently uncommitted XXXXM in z.zzzms > > or > > Concurrent uncommit: time z.zzzms, committed before XXXXM, committed after YYYYM, capacity ZZZZM Yes, I don't want to emit something that looks like a heap usage GC log line, if it is not. Unfortunately, `X->Y (Z) T.TTTTms` is a common format for X and Y as heap use. I agree posting X == Y would be only marginally better. So, maybe this goes as middle ground: Concurrent uncommit XXXXM (YYYYM) z.zzzms ...where XXXX is the amount uncommitted, YYYY is the final heap capacity? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1857007496 From wkemper at openjdk.org Mon Nov 25 17:59:26 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 25 Nov 2024 17:59:26 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v7] In-Reply-To: <9HyWt8dTl_6N6S0YhLw22cBJz8PMP1SVGXgB_ELdYF8=.4f59fd3a-fa0a-4fcf-84c8-30616b463a0c@github.com> References: <2ctRsJ6JhReYC9yjcPV4eRljht3bMAZ9B3urifpvQXQ=.470aa336-f334-4345-863b-557250aa3416@github.com> <9MFkD3YNuSJIBPOsHYEKFX98XkMGKgMYa_7j-7usxC0=.0a99aabd-6272-40db-a5aa-6022023c1e4f@github.com> <9HyWt8dTl_6N6S0YhLw22cBJz8PMP1SVGXgB_ELdYF8=.4f59fd3a-fa0a-4fcf-84c8-30616b463a0c@github.com> Message-ID: On Mon, 25 Nov 2024 17:28:19 GMT, Aleksey Shipilev wrote: >> Hmm, the numbers are preceded by `Concurrent uncommit`, with that context it's not much of a stretch to think these numbers represent the change in _committed_ memory. The original log message (in which heap usage may increase during uncommit) was _not_ helpful. A message with the same format in which heap usage also appears to _not change at all_ during an uncommit is also perplexing. Are we trying too hard to preserve the original, not useful message? Maybe we just want a new message that plainly says: >> >> >> Concurrently uncommitted XXXXM in z.zzzms >> >> or >> >> Concurrent uncommit: time z.zzzms, committed before XXXXM, committed after YYYYM, capacity ZZZZM > > Yes, I don't want to emit something that looks like a heap usage GC log line, if it is not. Unfortunately, `X->Y (Z) T.TTTTms` is a common format for X and Y as heap use. I agree posting X == Y would be only marginally better. So, maybe this goes as middle ground: > > > Concurrent uncommit XXXXM (YYYYM) z.zzzms > > > ...where XXXX is the amount uncommitted, YYYY is the final heap capacity? Okay, this looks good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1857092060 From wkemper at openjdk.org Tue Nov 26 01:07:39 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 26 Nov 2024 01:07:39 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v8] In-Reply-To: References: Message-ID: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - Log uncommitted delta and capacity - Merge remote-tracking branch 'jdk/master' into shen-uncommit-thread - Restore logging format, show change in committed heap, rather than usage - Allow commits initially - Use idiomatic name for CADR class - Improve comments - Do not notify uncommit thread when uncommit is forbidden - Prevent uncommit thread from running during GC - Style and formatting fixes - Alphabetize includes in shenandoahHeap.cpp - ... and 8 more: https://git.openjdk.org/jdk/compare/f2fab487...847a2593 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22019/files - new: https://git.openjdk.org/jdk/pull/22019/files/2cb71140..847a2593 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=06-07 Stats: 225409 lines in 4393 files changed: 87200 ins; 121188 del; 17021 mod Patch: https://git.openjdk.org/jdk/pull/22019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22019/head:pull/22019 PR: https://git.openjdk.org/jdk/pull/22019 From ayang at openjdk.org Tue Nov 26 08:57:46 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 26 Nov 2024 08:57:46 GMT Subject: RFR: 8344853: Parallel: Improve comments in psParallelCompact In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 10:39:31 GMT, Albert Mingkun Yang wrote: > Trivial revising some existing comments and adding some comments. No real code change other than a renaming. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22318#issuecomment-2500028087 From ayang at openjdk.org Tue Nov 26 08:57:46 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 26 Nov 2024 08:57:46 GMT Subject: Integrated: 8344853: Parallel: Improve comments in psParallelCompact In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 10:39:31 GMT, Albert Mingkun Yang wrote: > Trivial revising some existing comments and adding some comments. No real code change other than a renaming. This pull request has now been integrated. Changeset: 9793e73b Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/9793e73bc1b25ed92d6f0599fd2e721249389df7 Stats: 17 lines in 1 file changed: 1 ins; 1 del; 15 mod 8344853: Parallel: Improve comments in psParallelCompact Reviewed-by: sjohanss, zgu ------------- PR: https://git.openjdk.org/jdk/pull/22318 From shade at openjdk.org Tue Nov 26 10:21:47 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 26 Nov 2024 10:21:47 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v8] In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 01:07:39 GMT, William Kemper wrote: >> Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: > > - Log uncommitted delta and capacity > - Merge remote-tracking branch 'jdk/master' into shen-uncommit-thread > - Restore logging format, show change in committed heap, rather than usage > - Allow commits initially > - Use idiomatic name for CADR class > - Improve comments > - Do not notify uncommit thread when uncommit is forbidden > - Prevent uncommit thread from running during GC > - Style and formatting fixes > - Alphabetize includes in shenandoahHeap.cpp > - ... and 8 more: https://git.openjdk.org/jdk/compare/a518ae81...847a2593 Changes requested by shade (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.cpp line 75: > 73: MonitorLocker locker(&_stop_lock, Mutex::_no_safepoint_check_flag); > 74: if (!_stop_requested.is_set()) { > 75: locker.wait((int64_t)shrink_period); I tried to test this on some of my toy examples, and realized this particular line may end up as `locker.wait(0)`, which means "wait indefinitely, until notified". This breaks periodic commits. The old code rode on control thread doing `MAX2(1, ...)`, so we never feed `0` into `wait`. I am also confused about units. The comment above says `shrink_period` is in seconds, but `locker.wait` accepts milliseconds? src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.cpp line 181: > 179: log_info(gc)("%s " PROPERFMT "(" PROPERFMT ") %.3fms", > 180: msg, PROPERFMTARGS(committed_start - committed_end), PROPERFMTARGS(_heap->capacity()), > 181: elapsed * MILLIUNITS); I think we want an additional space. I see the current output is: [11.366s][info][gc] Concurrent uncommit 32768K(192M) 2.506ms Should probably be: [11.366s][info][gc] Concurrent uncommit 32768K (192M) 2.506ms ------------- PR Review: https://git.openjdk.org/jdk/pull/22019#pullrequestreview-2461003962 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1858191192 PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1858203087 From shade at openjdk.org Tue Nov 26 10:21:48 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 26 Nov 2024 10:21:48 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v8] In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 10:10:42 GMT, Aleksey Shipilev wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: >> >> - Log uncommitted delta and capacity >> - Merge remote-tracking branch 'jdk/master' into shen-uncommit-thread >> - Restore logging format, show change in committed heap, rather than usage >> - Allow commits initially >> - Use idiomatic name for CADR class >> - Improve comments >> - Do not notify uncommit thread when uncommit is forbidden >> - Prevent uncommit thread from running during GC >> - Style and formatting fixes >> - Alphabetize includes in shenandoahHeap.cpp >> - ... and 8 more: https://git.openjdk.org/jdk/compare/a518ae81...847a2593 > > src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.cpp line 75: > >> 73: MonitorLocker locker(&_stop_lock, Mutex::_no_safepoint_check_flag); >> 74: if (!_stop_requested.is_set()) { >> 75: locker.wait((int64_t)shrink_period); > > I tried to test this on some of my toy examples, and realized this particular line may end up as `locker.wait(0)`, which means "wait indefinitely, until notified". This breaks periodic commits. The old code rode on control thread doing `MAX2(1, ...)`, so we never feed `0` into `wait`. I am also confused about units. The comment above says `shrink_period` is in seconds, but `locker.wait` accepts milliseconds? It sounds like this line should be: locker.wait(MAX2(1, shrink_period * 1000)); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1858200969 From eosterlund at openjdk.org Tue Nov 26 12:20:40 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 26 Nov 2024 12:20:40 GMT Subject: RFR: 8344414: ZGC: Another division by zero in rule_major_allocation_rate In-Reply-To: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> References: <8cCw8As_oQRtYlDWTh72IolBnWELfG27-Rm0jraW8o4=.445484ea-7ab1-449e-b1fe-fab2fa8b288b@github.com> Message-ID: <66C-WPgEhBo-dU1VG1WarcLOcV5ILInldKeWfC5ETZc=.fca34ddb-d146-4af3-a0db-f4c13e3c6089@github.com> On Tue, 19 Nov 2024 07:18:20 GMT, Axel Boldt-Christmas wrote: > This specific issue was known since #20888. As well as a more serious issue in `calculate_extra_young_gc_time` which may introduce a `NaN`. This specific issue is sane as long as we have IEEE 754 sans the C++ standard making division by zero UB. > > As discussed in #21304 it is probably better to try and tackle the division by zero issue by making sure the input is never zero. This patch introduces a small offset to the average which will effectively leave the value unchanged unless it is zero, and behave as almost zero in calculations without causing actual division by zero. > > There is still the issue with `NaN`, this patch adds a short circuit when this can occur and returns the analytical result of the calculation. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22228#pullrequestreview-2461361841 From bpb at openjdk.org Tue Nov 26 16:09:51 2024 From: bpb at openjdk.org (Brian Burkhalter) Date: Tue, 26 Nov 2024 16:09:51 GMT Subject: RFR: 8340728: Test vmTestbase/gc/memory/Nio/Nio.java is failing to allocate all direct buffer memory [v3] In-Reply-To: <-LvX-iXqsW7StJQ22bEcZjI6-_rrA7doVsvyo_GMmhI=.22850b66-edea-4196-9544-47c84069bbae@github.com> References: <-LvX-iXqsW7StJQ22bEcZjI6-_rrA7doVsvyo_GMmhI=.22850b66-edea-4196-9544-47c84069bbae@github.com> Message-ID: On Mon, 28 Oct 2024 22:09:07 GMT, Brian Burkhalter wrote: >> First attempt to allocate `VM.maxDirectMemory()` bytes of direct buffer memory, decreasing by 1024 bytes for each `OutOfMemoryError` until allocation succeeds. > > Brian Burkhalter has updated the pull request incrementally with one additional commit since the last revision: > > 8340728: Fail if too much direct memory was allocated before this test is run This request will likely be withdrawn when #22339 is integrated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21623#issuecomment-2501261791 From wkemper at openjdk.org Tue Nov 26 19:08:58 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 26 Nov 2024 19:08:58 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v9] In-Reply-To: References: Message-ID: > Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Use count of regions uncommitted to compute uncommit delta - Decouple polling interval from uncommit time out ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22019/files - new: https://git.openjdk.org/jdk/pull/22019/files/847a2593..8d3c3926 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22019&range=07-08 Stats: 15 lines in 1 file changed: 2 ins; 4 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/22019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22019/head:pull/22019 PR: https://git.openjdk.org/jdk/pull/22019 From shade at openjdk.org Tue Nov 26 19:28:42 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 26 Nov 2024 19:28:42 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v9] In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 19:08:58 GMT, William Kemper wrote: >> Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure. > > William Kemper has updated the pull request incrementally with two additional commits since the last revision: > > - Use count of regions uncommitted to compute uncommit delta > - Decouple polling interval from uncommit time out Looks good! ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22019#pullrequestreview-2462504290 From wkemper at openjdk.org Tue Nov 26 20:13:42 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 26 Nov 2024 20:13:42 GMT Subject: RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v3] In-Reply-To: <2aAK-4vQxs3-P-cscvtJqOwn40B2aduezOZkQIKK-BY=.f1b9a066-63ef-4fe3-bb3d-2efcfac2b6af@github.com> References: <9SXRh1N-RjMfm_G4LvQFsuF5_DvRxMCrcCXMOtsAwpM=.2b37fe99-1107-440d-a4cb-468fe415b3be@github.com> <2aAK-4vQxs3-P-cscvtJqOwn40B2aduezOZkQIKK-BY=.f1b9a066-63ef-4fe3-bb3d-2efcfac2b6af@github.com> Message-ID: On Tue, 19 Nov 2024 02:03:18 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Prevent uncommit thread from running during GC > > Looks good to me. A few documentation comment requests. > > Also please share performance data in this PR or in the ticket, especially from the perf/benchmark that may have precipitated this change. @ysramakrishna - I ran several iterations of specjbb2015 with different variations of polling interval. Results show that 1/10th of `ShenandoahUncommitDelay` is reasonable, and avoids unintentional commit delays when the polling interval is equal or greater than `ShenandoahUncommitDelay`. Category | Count | Total | GeoMean | Average | Trim 0.1 | StdDev | Minimum | Maximum openjdk:master critical_jops | 5 | 50862.000 | 10162.095 | 10172.400 | 10172.400 | 513.605 | 9630.000 | 10882.000 30ms polling critical_jops | 5 | 48035.000 | 9582.113 | 9607.000 | 9607.000 | 778.036 | 8808.000 | 10692.000 30s polling critical_jops | 5 | 56398.000 | 11272.026 | 11279.600 | 11279.600 | 460.355 | 10627.000 | 11842.000 no polling critical_jops | 5 | 55917.000 | 11176.046 | 11183.400 | 11183.400 | 460.960 | 10899.000 | 11995.000 ------------- PR Comment: https://git.openjdk.org/jdk/pull/22019#issuecomment-2501826875 From eosterlund at openjdk.org Wed Nov 27 09:59:41 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 27 Nov 2024 09:59:41 GMT Subject: RFR: 8331735: UpcallLinker::on_exit races with GC when copying frame anchor In-Reply-To: <4Q4duvOf_UPZkpps6YI0SvRcYYQkG4b7ckTUUTVB494=.9fa7eaec-73fc-43c8-9f69-133630235b2b@github.com> References: <4Q4duvOf_UPZkpps6YI0SvRcYYQkG4b7ckTUUTVB494=.9fa7eaec-73fc-43c8-9f69-133630235b2b@github.com> Message-ID: On Mon, 28 Oct 2024 13:53:58 GMT, Jorn Vernee wrote: > There is a subtle race in `UpcallLinker::on_exit` between copying of the old frame anchor back into place, and the GC. Since this copy is not atomic, it may briefly appear as if a thread has no last Java frame, while still in the `_thread_in_native` state, which leads to the GC skipping processing of any active Java frames. > > This code was originally adapted from `JavaCallWrapper::!JavaCallWrapper` - the JNI mechanism for upcalls - but in that case the frame anchor copy happens in the `_thread_in_vm` state, which means the GC will wait for the thread to get to a safepoint. > > The solution proposed here is to do the frame anchor copy in the java thread state, before transitioning back to the native state. The java thread state, like the vm thread state, is also 'safe' i.e. the GC will wait for the thread to get to a safepoint, so we can safely do our non-atomic copy of the frame anchor. > > Additionally, this PR resolves a similar issue in `on_entry`, by moving the clearing of the pending exception (in case native code use a JNI API and didn't handle the exception afterwards). We now also skip checking for async exceptions when transitioning from native to java, so we don't immediately clear them. Any async exceptions will be picked up at the next safepoint instead. > > Special thanks to @stefank and @fisk for finding the root cause, and @jaikiran for testing and debugging. > > Testing: tier 1-4, 20k runs of the failing test on linux-aarch64. The fix looks good to me. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21742#pullrequestreview-2464531609 From aboldtch at openjdk.org Wed Nov 27 10:03:40 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 27 Nov 2024 10:03:40 GMT Subject: RFR: 8331735: UpcallLinker::on_exit races with GC when copying frame anchor In-Reply-To: <4Q4duvOf_UPZkpps6YI0SvRcYYQkG4b7ckTUUTVB494=.9fa7eaec-73fc-43c8-9f69-133630235b2b@github.com> References: <4Q4duvOf_UPZkpps6YI0SvRcYYQkG4b7ckTUUTVB494=.9fa7eaec-73fc-43c8-9f69-133630235b2b@github.com> Message-ID: On Mon, 28 Oct 2024 13:53:58 GMT, Jorn Vernee wrote: > There is a subtle race in `UpcallLinker::on_exit` between copying of the old frame anchor back into place, and the GC. Since this copy is not atomic, it may briefly appear as if a thread has no last Java frame, while still in the `_thread_in_native` state, which leads to the GC skipping processing of any active Java frames. > > This code was originally adapted from `JavaCallWrapper::!JavaCallWrapper` - the JNI mechanism for upcalls - but in that case the frame anchor copy happens in the `_thread_in_vm` state, which means the GC will wait for the thread to get to a safepoint. > > The solution proposed here is to do the frame anchor copy in the java thread state, before transitioning back to the native state. The java thread state, like the vm thread state, is also 'safe' i.e. the GC will wait for the thread to get to a safepoint, so we can safely do our non-atomic copy of the frame anchor. > > Additionally, this PR resolves a similar issue in `on_entry`, by moving the clearing of the pending exception (in case native code use a JNI API and didn't handle the exception afterwards). We now also skip checking for async exceptions when transitioning from native to java, so we don't immediately clear them. Any async exceptions will be picked up at the next safepoint instead. > > Special thanks to @stefank and @fisk for finding the root cause, and @jaikiran for testing and debugging. > > Testing: tier 1-4, 20k runs of the failing test on linux-aarch64. lgtm. Would be nice if if we could assert that we are not in native or blocked when touching the oops as well. Similarly to modifications of the frame anchor. But I agree that it should be done separately. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21742#pullrequestreview-2464545413 From jvernee at openjdk.org Wed Nov 27 12:23:45 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 27 Nov 2024 12:23:45 GMT Subject: Integrated: 8331735: UpcallLinker::on_exit races with GC when copying frame anchor In-Reply-To: <4Q4duvOf_UPZkpps6YI0SvRcYYQkG4b7ckTUUTVB494=.9fa7eaec-73fc-43c8-9f69-133630235b2b@github.com> References: <4Q4duvOf_UPZkpps6YI0SvRcYYQkG4b7ckTUUTVB494=.9fa7eaec-73fc-43c8-9f69-133630235b2b@github.com> Message-ID: <9AzJlA6eNp_-TRsifzXIgj3CJVU1sw7olmtM6IyFGFI=.d2d037f6-bfc6-4bbf-84b9-a9cc51a065a3@github.com> On Mon, 28 Oct 2024 13:53:58 GMT, Jorn Vernee wrote: > There is a subtle race in `UpcallLinker::on_exit` between copying of the old frame anchor back into place, and the GC. Since this copy is not atomic, it may briefly appear as if a thread has no last Java frame, while still in the `_thread_in_native` state, which leads to the GC skipping processing of any active Java frames. > > This code was originally adapted from `JavaCallWrapper::!JavaCallWrapper` - the JNI mechanism for upcalls - but in that case the frame anchor copy happens in the `_thread_in_vm` state, which means the GC will wait for the thread to get to a safepoint. > > The solution proposed here is to do the frame anchor copy in the java thread state, before transitioning back to the native state. The java thread state, like the vm thread state, is also 'safe' i.e. the GC will wait for the thread to get to a safepoint, so we can safely do our non-atomic copy of the frame anchor. > > Additionally, this PR resolves a similar issue in `on_entry`, by moving the clearing of the pending exception (in case native code use a JNI API and didn't handle the exception afterwards). We now also skip checking for async exceptions when transitioning from native to java, so we don't immediately clear them. Any async exceptions will be picked up at the next safepoint instead. > > Special thanks to @stefank and @fisk for finding the root cause, and @jaikiran for testing and debugging. > > Testing: tier 1-4, 20k runs of the failing test on linux-aarch64. This pull request has now been integrated. Changeset: 461ffafe Author: Jorn Vernee URL: https://git.openjdk.org/jdk/commit/461ffafeba459c077f1c2d9c5037305b71a8bc2a Stats: 15 lines in 1 file changed: 5 ins; 9 del; 1 mod 8331735: UpcallLinker::on_exit races with GC when copying frame anchor 8343144: UpcallLinker::on_entry racingly clears pending exception with GC safepoints 8286875: ProgrammableUpcallHandler::on_entry/on_exit access thread fields from native Reviewed-by: dholmes, eosterlund, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/21742 From bpb at openjdk.org Thu Nov 28 01:54:45 2024 From: bpb at openjdk.org (Brian Burkhalter) Date: Thu, 28 Nov 2024 01:54:45 GMT Subject: RFR: 8340728: Test vmTestbase/gc/memory/Nio/Nio.java is failing to allocate all direct buffer memory [v3] In-Reply-To: <-LvX-iXqsW7StJQ22bEcZjI6-_rrA7doVsvyo_GMmhI=.22850b66-edea-4196-9544-47c84069bbae@github.com> References: <-LvX-iXqsW7StJQ22bEcZjI6-_rrA7doVsvyo_GMmhI=.22850b66-edea-4196-9544-47c84069bbae@github.com> Message-ID: <3kxxYbetfRSStCwERmneR-No-qXFvkM1dBnAT7r2Xpc=.029ba73e-90cb-46f1-9352-7639265a9bda@github.com> On Mon, 28 Oct 2024 22:09:07 GMT, Brian Burkhalter wrote: >> First attempt to allocate `VM.maxDirectMemory()` bytes of direct buffer memory, decreasing by 1024 bytes for each `OutOfMemoryError` until allocation succeeds. > > Brian Burkhalter has updated the pull request incrementally with one additional commit since the last revision: > > 8340728: Fail if too much direct memory was allocated before this test is run This request is irrelevant now that #22339 is integrated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21623#issuecomment-2505111530 From bpb at openjdk.org Thu Nov 28 01:54:46 2024 From: bpb at openjdk.org (Brian Burkhalter) Date: Thu, 28 Nov 2024 01:54:46 GMT Subject: Withdrawn: 8340728: Test vmTestbase/gc/memory/Nio/Nio.java is failing to allocate all direct buffer memory In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 18:22:03 GMT, Brian Burkhalter wrote: > First attempt to allocate `VM.maxDirectMemory()` bytes of direct buffer memory, decreasing by 1024 bytes for each `OutOfMemoryError` until allocation succeeds. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21623 From iwalulya at openjdk.org Thu Nov 28 10:57:39 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 28 Nov 2024 10:57:39 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v2] In-Reply-To: <5BFfQWo76ziDd0mT3TczoRhPzPm3bGmnBbS8f5dS63M=.3a130ddd-05de-4fd2-97e0-07c6f9fd3b24@github.com> References: <5BFfQWo76ziDd0mT3TczoRhPzPm3bGmnBbS8f5dS63M=.3a130ddd-05de-4fd2-97e0-07c6f9fd3b24@github.com> Message-ID: On Fri, 22 Nov 2024 11:00:52 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > simplify pas allocator destruction and manager phase tracking src/hotspot/share/gc/shared/partialArrayState.hpp line 176: > 174: > 175: // Limit on the number of allocators this manager supports. > 176: uint _num_allocators; _max_num_allocators; the comment helps, but we can add max to the name. src/hotspot/share/gc/shared/partialArrayState.hpp line 185: > 183: // - low half: allocators constructed > 184: // - high half: allocators destructed (debug only) > 185: volatile CounterState _counters; Seems like a lot of code overhead to accomodate debugging! However, if there is no easier approach, then not a blocker for me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1861960042 PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1861963894 From ayang at openjdk.org Thu Nov 28 15:28:12 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 28 Nov 2024 15:28:12 GMT Subject: RFR: 8345217: Parallel: Refactor PSParallelCompact::next_src_region Message-ID: Simple removing some unnecessary calculations in locating the next source-region during full-gc. Test: tier1-5 ------------- Commit messages: - cleanup Changes: https://git.openjdk.org/jdk/pull/22441/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22441&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345217 Stats: 10 lines in 1 file changed: 0 ins; 7 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22441.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22441/head:pull/22441 PR: https://git.openjdk.org/jdk/pull/22441 From ayang at openjdk.org Thu Nov 28 15:33:40 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 28 Nov 2024 15:33:40 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v2] In-Reply-To: References: <5BFfQWo76ziDd0mT3TczoRhPzPm3bGmnBbS8f5dS63M=.3a130ddd-05de-4fd2-97e0-07c6f9fd3b24@github.com> Message-ID: On Thu, 28 Nov 2024 10:50:51 GMT, Ivan Walulya wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> simplify pas allocator destruction and manager phase tracking > > src/hotspot/share/gc/shared/partialArrayState.hpp line 185: > >> 183: // - low half: allocators constructed >> 184: // - high half: allocators destructed (debug only) >> 185: volatile CounterState _counters; > > Seems like a lot of code overhead to accomodate debugging! However, if there is no easier approach, then not a blocker for me. I feel the motivation for this encoding is missing from this PR. It's not immediately clear why these two counters need to be combined in this way. Kim outlined some rationale during our offline discussion, but for the benefit of other reviewers and future readers of this code, this rationale should be documented alongside the encoding. Having those arguments written down would help assess whether the additional complexity is justified. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1862392159 From ayang at openjdk.org Thu Nov 28 15:55:11 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 28 Nov 2024 15:55:11 GMT Subject: RFR: 8345220: Serial: Refactor TenuredGeneration::promotion_attempt_is_safe Message-ID: Trivial using MIN2 to replace `>=` and `||` for better readability. ------------- Commit messages: - trivial Changes: https://git.openjdk.org/jdk/pull/22444/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22444&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345220 Stats: 6 lines in 1 file changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22444.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22444/head:pull/22444 PR: https://git.openjdk.org/jdk/pull/22444 From tschatzl at openjdk.org Thu Nov 28 16:10:41 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 28 Nov 2024 16:10:41 GMT Subject: RFR: 8345220: Serial: Refactor TenuredGeneration::promotion_attempt_is_safe In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 15:50:20 GMT, Albert Mingkun Yang wrote: > Trivial using MIN2 to replace `>=` and `||` for better readability. lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22444#pullrequestreview-2468519118 From duke at openjdk.org Thu Nov 28 16:10:42 2024 From: duke at openjdk.org (Abdelhak Zaaim) Date: Thu, 28 Nov 2024 16:10:42 GMT Subject: RFR: 8345220: Serial: Refactor TenuredGeneration::promotion_attempt_is_safe In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 15:50:20 GMT, Albert Mingkun Yang wrote: > Trivial using MIN2 to replace `>=` and `||` for better readability. Marked as reviewed by abdelhak-zaaim at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/22444#pullrequestreview-2468522789 From tschatzl at openjdk.org Thu Nov 28 16:26:48 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 28 Nov 2024 16:26:48 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v2] In-Reply-To: <5BFfQWo76ziDd0mT3TczoRhPzPm3bGmnBbS8f5dS63M=.3a130ddd-05de-4fd2-97e0-07c6f9fd3b24@github.com> References: <5BFfQWo76ziDd0mT3TczoRhPzPm3bGmnBbS8f5dS63M=.3a130ddd-05de-4fd2-97e0-07c6f9fd3b24@github.com> Message-ID: On Fri, 22 Nov 2024 11:00:52 GMT, Kim Barrett wrote: >> This change splits the existing PartialArrayStateAllocator class into an >> allocator class and a manager class. The allocator class is per worker >> thread. The manager class provides the memory management context for a >> group of allocators. >> >> This change is in preparation for some other refactorings around partial array >> state handling. That work is intended to make it easier for various >> collections to adopt the use of that mechanism for chunking the processing of >> large objArrays. >> >> The new implementation for the memory management context is based on the >> existing one, with an Arena per worker, now controlled by the manager object. >> Potential improvements to that can be explored in the future. Some ideas >> include improvements to the Arena API or a single thread-safe Arena variant >> (trading slower arena allocation (which is the slow path) for less memory >> usage). >> >> G1 has a single manager, reused by each young/mixed GC. Associated state >> allocators are nested in the per-worker structures, so deleted at the end of >> the collection. The manager is reset at the end of the collection to allow the >> memory to be recycled. It is planned that the STW full collector will also use >> this manager when converted to use PartialArrayState. So it will be reused by >> all STW collections. >> >> ParallelGC has a single manager, reused by each young collection. Because the >> per-worker promotion managers are never destroyed, their nested state >> allocators are never destroyed. So the manager is not reset, instead leaving >> previously allocated states in the allocator free lists for use by the next >> collection. This means the full collector won't be able to use the same >> manager object as the young collectors. >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > simplify pas allocator destruction and manager phase tracking src/hotspot/share/gc/shared/partialArrayState.hpp line 162: > 160: // - releasing: When an allocator is destroyed the manager transitions to this > 161: // phase. It remains in this phase until all extent allocators associated with > 162: // this manager have been destroyed. During this phase, new allocators man not Suggestion: // this manager have been destroyed. During this phase, new allocators may not ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1862443150 From tschatzl at openjdk.org Thu Nov 28 16:26:48 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 28 Nov 2024 16:26:48 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v2] In-Reply-To: References: <5BFfQWo76ziDd0mT3TczoRhPzPm3bGmnBbS8f5dS63M=.3a130ddd-05de-4fd2-97e0-07c6f9fd3b24@github.com> Message-ID: On Thu, 28 Nov 2024 15:31:05 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/shared/partialArrayState.hpp line 185: >> >>> 183: // - low half: allocators constructed >>> 184: // - high half: allocators destructed (debug only) >>> 185: volatile CounterState _counters; >> >> Seems like a lot of code overhead to accomodate debugging! However, if there is no easier approach, then not a blocker for me. > > I feel the motivation for this encoding is missing from this PR. It's not immediately clear why these two counters need to be combined in this way. Kim outlined some rationale during our offline discussion, but for the benefit of other reviewers and future readers of this code, this rationale should be documented alongside the encoding. Having those arguments written down would help assess whether the additional complexity is justified. What about using a union/struct instead of all that manual masking and shifting? I agree that encoding them in this way should give a rationale for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1862458445 From duke at openjdk.org Thu Nov 28 19:01:47 2024 From: duke at openjdk.org (duke) Date: Thu, 28 Nov 2024 19:01:47 GMT Subject: Withdrawn: 8339668: Parallel: Adopt PartialArrayState to consolidate marking stack in Full GC In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 14:01:39 GMT, Zhengyu Gu wrote: > Please review this patch that adopts `PartialArrayState`introduced by [JDK-8337709](https://bugs.openjdk.org/browse/JDK-8337709) to consolidate `_oop_task_queues` and `_objarray_task_queues` into single `_marking_stacks`. > > The change mirrors Kim's [JDK-8311163](https://bugs.openjdk.org/browse/JDK-8311163) work, therefore, there are methods can be consolidated and simplified, but I would like defer to a followup CR. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21089 From mli at openjdk.org Fri Nov 29 09:07:40 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 29 Nov 2024 09:07:40 GMT Subject: RFR: 8345220: Serial: Refactor TenuredGeneration::promotion_attempt_is_safe In-Reply-To: References: Message-ID: <6WU_ycsXzFjBc7R9QalLGFGUPoZ81KbeOdWhVV4842g=.f9a560a1-e631-4df2-aa57-617d615c44a9@github.com> On Thu, 28 Nov 2024 15:50:20 GMT, Albert Mingkun Yang wrote: > Trivial using MIN2 to replace `>=` and `||` for better readability. Looks good. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22444#pullrequestreview-2469424302 From tschatzl at openjdk.org Fri Nov 29 09:41:08 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 29 Nov 2024 09:41:08 GMT Subject: RFR: 8345173: BlockLocationPrinter::print_location misses a ResourceMark Message-ID: Hi all, please review this small change that adds a missing `ResourceMark` to `BlockLocationPrinter`; it can be called at very arbitrary places (e.g. the stop() method of MacroAssembler), and without this change it might fail with a "Missing ResourceMark error - possible memory leak" error instead of providing the stop() output (and then faiiling). Testing: local testing, after the change the ResourceMark crash goes away, gha Thanks, Thomas ------------- Commit messages: - * add missing include - 8345173 Changes: https://git.openjdk.org/jdk/pull/22455/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22455&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345173 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22455/head:pull/22455 PR: https://git.openjdk.org/jdk/pull/22455 From sjohanss at openjdk.org Fri Nov 29 13:55:37 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 29 Nov 2024 13:55:37 GMT Subject: RFR: 8345173: BlockLocationPrinter::print_location misses a ResourceMark In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 09:36:16 GMT, Thomas Schatzl wrote: > Hi all, > > please review this small change that adds a missing `ResourceMark` to `BlockLocationPrinter`; it can be called at very arbitrary places (e.g. the stop() method of MacroAssembler), and without this change it might fail with a "Missing ResourceMark error - possible memory leak" error instead of providing the stop() output (and then faiiling). > > Testing: local testing, after the change the ResourceMark crash goes away, gha > > Thanks, > Thomas Marked as reviewed by sjohanss (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22455#pullrequestreview-2469968711 From iwalulya at openjdk.org Fri Nov 29 15:06:39 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 29 Nov 2024 15:06:39 GMT Subject: RFR: 8345173: BlockLocationPrinter::print_location misses a ResourceMark In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 09:36:16 GMT, Thomas Schatzl wrote: > Hi all, > > please review this small change that adds a missing `ResourceMark` to `BlockLocationPrinter`; it can be called at very arbitrary places (e.g. the stop() method of MacroAssembler), and without this change it might fail with a "Missing ResourceMark error - possible memory leak" error instead of providing the stop() output (and then faiiling). > > Testing: local testing, after the change the ResourceMark crash goes away, gha > > Thanks, > Thomas Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22455#pullrequestreview-2470119396 From kbarrett at openjdk.org Fri Nov 29 16:09:15 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 29 Nov 2024 16:09:15 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v3] In-Reply-To: References: Message-ID: > This change splits the existing PartialArrayStateAllocator class into an > allocator class and a manager class. The allocator class is per worker > thread. The manager class provides the memory management context for a > group of allocators. > > This change is in preparation for some other refactorings around partial array > state handling. That work is intended to make it easier for various > collections to adopt the use of that mechanism for chunking the processing of > large objArrays. > > The new implementation for the memory management context is based on the > existing one, with an Arena per worker, now controlled by the manager object. > Potential improvements to that can be explored in the future. Some ideas > include improvements to the Arena API or a single thread-safe Arena variant > (trading slower arena allocation (which is the slow path) for less memory > usage). > > G1 has a single manager, reused by each young/mixed GC. Associated state > allocators are nested in the per-worker structures, so deleted at the end of > the collection. The manager is reset at the end of the collection to allow the > memory to be recycled. It is planned that the STW full collector will also use > this manager when converted to use PartialArrayState. So it will be reused by > all STW collections. > > ParallelGC has a single manager, reused by each young collection. Because the > per-worker promotion managers are never destroyed, their nested state > allocators are never destroyed. So the manager is not reset, instead leaving > previously allocated states in the allocator free lists for use by the next > collection. This means the full collector won't be able to use the same > manager object as the young collectors. > > Testing: mach5 tier1-5 Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: - num_allocators => max_allocators - fix comment typo - use struct/union instead of constants ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22287/files - new: https://git.openjdk.org/jdk/pull/22287/files/256b0021..f1a1be24 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22287&range=01-02 Stats: 75 lines in 2 files changed: 12 ins; 7 del; 56 mod Patch: https://git.openjdk.org/jdk/pull/22287.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22287/head:pull/22287 PR: https://git.openjdk.org/jdk/pull/22287 From kbarrett at openjdk.org Fri Nov 29 19:20:39 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 29 Nov 2024 19:20:39 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v2] In-Reply-To: References: <5BFfQWo76ziDd0mT3TczoRhPzPm3bGmnBbS8f5dS63M=.3a130ddd-05de-4fd2-97e0-07c6f9fd3b24@github.com> Message-ID: On Thu, 28 Nov 2024 16:09:36 GMT, Thomas Schatzl wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> simplify pas allocator destruction and manager phase tracking > > src/hotspot/share/gc/shared/partialArrayState.hpp line 162: > >> 160: // - releasing: When an allocator is destroyed the manager transitions to this >> 161: // phase. It remains in this phase until all extent allocators associated with >> 162: // this manager have been destroyed. During this phase, new allocators man not > > Suggestion: > > // this manager have been destroyed. During this phase, new allocators may not Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1863876494 From kbarrett at openjdk.org Fri Nov 29 19:20:40 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 29 Nov 2024 19:20:40 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v2] In-Reply-To: References: <5BFfQWo76ziDd0mT3TczoRhPzPm3bGmnBbS8f5dS63M=.3a130ddd-05de-4fd2-97e0-07c6f9fd3b24@github.com> Message-ID: On Thu, 28 Nov 2024 10:48:13 GMT, Ivan Walulya wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> simplify pas allocator destruction and manager phase tracking > > src/hotspot/share/gc/shared/partialArrayState.hpp line 176: > >> 174: >> 175: // Limit on the number of allocators this manager supports. >> 176: uint _num_allocators; > > _max_num_allocators; the comment helps, but we can add max to the name. Changed num_allocators to max_allocators (with or without leading underscore) throughout. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1863876518 From kbarrett at openjdk.org Fri Nov 29 19:20:40 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 29 Nov 2024 19:20:40 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v2] In-Reply-To: References: <5BFfQWo76ziDd0mT3TczoRhPzPm3bGmnBbS8f5dS63M=.3a130ddd-05de-4fd2-97e0-07c6f9fd3b24@github.com> Message-ID: On Thu, 28 Nov 2024 16:22:26 GMT, Thomas Schatzl wrote: >> I feel the motivation for this encoding is missing from this PR. It's not immediately clear why these two counters need to be combined in this way. Kim outlined some rationale during our offline discussion, but for the benefit of other reviewers and future readers of this code, this rationale should be documented alongside the encoding. Having those arguments written down would help assess whether the additional complexity is justified. > > What about using a union/struct instead of all that manual masking and shifting? > > I agree that encoding them in this way should give a rationale for that. I tried using some combination of union/struct a couple of different ways, but the the result seemd significantly worse than what I published. But some simplifications since those attempts, plus an idea for a different way of structuring things, and I've now got something union/struct-based that seems better. It could be slightly further improved in syntax if I was willing to drop the CounterState::_cf member name and make it an anonymous struct. That's a C11 feature that is provided as a C++ extension by all of our supported compilers. You might not want to look at the difference between the two commits very much, but instead look at the updated complete change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1863876496 From kbarrett at openjdk.org Sat Nov 30 10:59:41 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 30 Nov 2024 10:59:41 GMT Subject: RFR: 8344665: Refactor PartialArrayState allocation for reuse [v2] In-Reply-To: References: <5BFfQWo76ziDd0mT3TczoRhPzPm3bGmnBbS8f5dS63M=.3a130ddd-05de-4fd2-97e0-07c6f9fd3b24@github.com> Message-ID: On Fri, 29 Nov 2024 19:17:16 GMT, Kim Barrett wrote: >> What about using a union/struct instead of all that manual masking and shifting? >> >> I agree that encoding them in this way should give a rationale for that. > > I tried using some combination of union/struct a couple of different ways, but > the the result seemd significantly worse than what I published. But some > simplifications since those attempts, plus an idea for a different way of > structuring things, and I've now got something union/struct-based that seems > better. > > It could be slightly further improved in syntax if I was willing to drop the > CounterState::_cf member name and make it an anonymous struct. That's a C11 > feature that is provided as a C++ extension by all of our supported compilers. > > You might not want to look at the difference between the two commits very > much, but instead look at the updated complete change. I also expanded commentary for `PartialArrayStateManager::_counters`. Maybe that will help? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22287#discussion_r1864216818