From shade at openjdk.org Tue Jan 3 10:13:49 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jan 2023 10:13:49 GMT Subject: RFR: 8299072: java_lang_ref_Reference::clear_referent should be GC agnostic In-Reply-To: References: Message-ID: <6S2QZ9-NMeIQe0FUCdZlCKuWG8h_dwdFs9_sqMbJ6Ng=.4fe5ec3c-d4fc-4efe-ae19-5b9caf64a316@github.com> On Tue, 20 Dec 2022 07:05:34 GMT, Erik ?sterlund wrote: > The current java_lang_ref_Reference::clear_referent implementation performs a raw reference clear. That doesn't work well with upcoming GC algorithms. It should be made GC agnostic by going through the normal access API. Looks fine. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.org/jdk/pull/11736 From eosterlund at openjdk.org Tue Jan 3 15:40:48 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 3 Jan 2023 15:40:48 GMT Subject: RFR: 8299072: java_lang_ref_Reference::clear_referent should be GC agnostic In-Reply-To: <6S2QZ9-NMeIQe0FUCdZlCKuWG8h_dwdFs9_sqMbJ6Ng=.4fe5ec3c-d4fc-4efe-ae19-5b9caf64a316@github.com> References: <6S2QZ9-NMeIQe0FUCdZlCKuWG8h_dwdFs9_sqMbJ6Ng=.4fe5ec3c-d4fc-4efe-ae19-5b9caf64a316@github.com> Message-ID: On Tue, 3 Jan 2023 10:11:15 GMT, Aleksey Shipilev wrote: >> The current java_lang_ref_Reference::clear_referent implementation performs a raw reference clear. That doesn't work well with upcoming GC algorithms. It should be made GC agnostic by going through the normal access API. > > Looks fine. Thanks for the review @shipilev! ------------- PR: https://git.openjdk.org/jdk/pull/11736 From mdoerr at openjdk.org Tue Jan 3 15:58:49 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Jan 2023 15:58:49 GMT Subject: RFR: 8299312: Clean up BarrierSetNMethod In-Reply-To: References: Message-ID: <1km8O--i4urmIjKeFK2AT3mO0d4DoTX5RcPYC1XdD-k=.d82590fe-35d2-406f-a502-fd5bb2c145f5@github.com> On Fri, 23 Dec 2022 12:00:46 GMT, Erik ?sterlund wrote: > The terminology in BarrierSetNMethod is not crisp. In platform code we talk about a per-nmethod "guard value", but on shared level we call the same value arm value or disarm value in different contexts. But it really depends on the value whether the nmethod is disarmed or armed. We should embrace the "guard value" terminology and lift it in to the shared code level. > We also have more functionality than we need on platform level. The platform level only needs to know how to deoptimize, and how to set/get the guard value of an nmethod. The more specific functionality should be moved to the shared code and be expressed in terms of said setter/getter. With https://github.com/openjdk/jdk/commit/245f0cf4ac9dc655bfe2abb1c88c6ed1ddffd291, nmethod entry barriers are implemented on all platforms, now. The ARM32 parts should be added. (Also see failing pre-submit test.) ------------- PR: https://git.openjdk.org/jdk/pull/11774 From kdnilsen at openjdk.org Tue Jan 3 22:25:26 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 3 Jan 2023 22:25:26 GMT Subject: RFR: Allow heuristic trigger to increase capacity instead of running a collection In-Reply-To: References: Message-ID: On Fri, 30 Dec 2022 00:07:29 GMT, William Kemper wrote: > Before the adaptive heuristic starts a collection, it will attempt to increase the capacity of its generation. If the capacity is increased, the heuristic will re-evaluate the trigger criteria. > > There is also a change here to attempt to increase the size of the old generation in response to a promotion failure. Marked as reviewed by kdnilsen (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/190 From eosterlund at openjdk.org Wed Jan 4 14:50:20 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 4 Jan 2023 14:50:20 GMT Subject: RFR: 8299312: Clean up BarrierSetNMethod [v2] In-Reply-To: References: Message-ID: > The terminology in BarrierSetNMethod is not crisp. In platform code we talk about a per-nmethod "guard value", but on shared level we call the same value arm value or disarm value in different contexts. But it really depends on the value whether the nmethod is disarmed or armed. We should embrace the "guard value" terminology and lift it in to the shared code level. > We also have more functionality than we need on platform level. The platform level only needs to know how to deoptimize, and how to set/get the guard value of an nmethod. The more specific functionality should be moved to the shared code and be expressed in terms of said setter/getter. Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - ARM support - Merge branch 'master' into 8299312_barrier_set_nmethod_cleanup - Fix Shenandoah build - 8299312: Clean up BarrierSetNMethod ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11774/files - new: https://git.openjdk.org/jdk/pull/11774/files/78afd161..e0b32db3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11774&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11774&range=00-01 Stats: 9893 lines in 672 files changed: 5058 ins; 2615 del; 2220 mod Patch: https://git.openjdk.org/jdk/pull/11774.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11774/head:pull/11774 PR: https://git.openjdk.org/jdk/pull/11774 From eosterlund at openjdk.org Wed Jan 4 14:53:02 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 4 Jan 2023 14:53:02 GMT Subject: Integrated: 8299072: java_lang_ref_Reference::clear_referent should be GC agnostic In-Reply-To: References: Message-ID: On Tue, 20 Dec 2022 07:05:34 GMT, Erik ?sterlund wrote: > The current java_lang_ref_Reference::clear_referent implementation performs a raw reference clear. That doesn't work well with upcoming GC algorithms. It should be made GC agnostic by going through the normal access API. This pull request has now been integrated. Changeset: c32a34c2 Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/c32a34c2e534147bccf8320b095edda9e1088f5a Stats: 8 lines in 5 files changed: 5 ins; 0 del; 3 mod 8299072: java_lang_ref_Reference::clear_referent should be GC agnostic Co-authored-by: Axel Boldt-Christmas Reviewed-by: dholmes, shade, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/11736 From mdoerr at openjdk.org Wed Jan 4 15:43:57 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 4 Jan 2023 15:43:57 GMT Subject: RFR: 8299312: Clean up BarrierSetNMethod [v2] In-Reply-To: References: Message-ID: <6kIYLSUVYIVvoKhLGGhYowSFyY09rWE07Tw4le5q2Bw=.90fed758-0136-4b5c-bd9f-73821c010930@github.com> On Wed, 4 Jan 2023 14:50:20 GMT, Erik ?sterlund wrote: >> The terminology in BarrierSetNMethod is not crisp. In platform code we talk about a per-nmethod "guard value", but on shared level we call the same value arm value or disarm value in different contexts. But it really depends on the value whether the nmethod is disarmed or armed. We should embrace the "guard value" terminology and lift it in to the shared code level. >> We also have more functionality than we need on platform level. The platform level only needs to know how to deoptimize, and how to set/get the guard value of an nmethod. The more specific functionality should be moved to the shared code and be expressed in terms of said setter/getter. > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - ARM support > - Merge branch 'master' into 8299312_barrier_set_nmethod_cleanup > - Fix Shenandoah build > - 8299312: Clean up BarrierSetNMethod LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.org/jdk/pull/11774 From kdnilsen at openjdk.org Wed Jan 4 16:20:21 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 4 Jan 2023 16:20:21 GMT Subject: RFR: Enforce that generation sizes align with region sizes Message-ID: For correctness, the size of each generation should be a multiple of the region size. A recent change violated this requirement. ------------- Commit messages: - Enforce that generation sizes align with region sizes Changes: https://git.openjdk.org/shenandoah/pull/191/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=191&range=00 Stats: 5 lines in 2 files changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/191.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/191/head:pull/191 PR: https://git.openjdk.org/shenandoah/pull/191 From ysr at openjdk.org Wed Jan 4 16:41:29 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 4 Jan 2023 16:41:29 GMT Subject: RFR: Enforce that generation sizes align with region sizes In-Reply-To: References: Message-ID: On Wed, 4 Jan 2023 16:12:35 GMT, Kelvin Nilsen wrote: > For correctness, the size of each generation should be a multiple of the region size. > > A recent change violated this requirement. LGTM. Please feel free to include testing notes, if any. Thanks! ------------- Marked as reviewed by ysr (Author). PR: https://git.openjdk.org/shenandoah/pull/191 From wkemper at openjdk.org Wed Jan 4 16:54:24 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 4 Jan 2023 16:54:24 GMT Subject: Integrated: Allow heuristic trigger to increase capacity instead of running a collection In-Reply-To: References: Message-ID: On Fri, 30 Dec 2022 00:07:29 GMT, William Kemper wrote: > Before the adaptive heuristic starts a collection, it will attempt to increase the capacity of its generation. If the capacity is increased, the heuristic will re-evaluate the trigger criteria. > > There is also a change here to attempt to increase the size of the old generation in response to a promotion failure. This pull request has now been integrated. Changeset: ba808494 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/ba808494675509a7ed2d97d08a9fbc971dbc0900 Stats: 97 lines in 10 files changed: 73 ins; 9 del; 15 mod Allow heuristic trigger to increase capacity instead of running a collection Reviewed-by: kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/190 From ysr at openjdk.org Wed Jan 4 16:59:20 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 4 Jan 2023 16:59:20 GMT Subject: RFR: Allow heuristic trigger to increase capacity instead of running a collection In-Reply-To: References: Message-ID: On Fri, 30 Dec 2022 00:07:29 GMT, William Kemper wrote: > Before the adaptive heuristic starts a collection, it will attempt to increase the capacity of its generation. If the capacity is increased, the heuristic will re-evaluate the trigger criteria. > > There is also a change here to attempt to increase the size of the old generation in response to a promotion failure. The changes look ok. I did wonder though if one might (for odd situations) reduce the number of recursions through `should_start_gc()` by having some notion of error that we are trying to correct when we call `resize_and_and_evaluate(/* pass in error or size differential here */)` from `should_start_gc()`. Anyway, just a thought for you to think about. Reviewed! ------------- Marked as reviewed by ysr (Author). PR: https://git.openjdk.org/shenandoah/pull/190 From kdnilsen at openjdk.org Wed Jan 4 17:22:29 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 4 Jan 2023 17:22:29 GMT Subject: RFR: Enforce that generation sizes align with region sizes In-Reply-To: References: Message-ID: <7IIQ4-uabP7uLKg8mql3JyrUeBjOstgpsix37p3MI0o=.a67b3d86-0129-4b3d-a616-79157cb3858c@github.com> On Wed, 4 Jan 2023 16:12:35 GMT, Kelvin Nilsen wrote: > For correctness, the size of each generation should be a multiple of the region size. > > A recent change violated this requirement. I discovered this problem because of an assertion failure in a separate branch that was based on mainline. I added the same assertion into this branch and verified through our internal pipeline regression testing that the two corrections to existing implementation resolve the assertion failure and do not introduce any other regressions. ------------- PR: https://git.openjdk.org/shenandoah/pull/191 From wkemper at openjdk.org Wed Jan 4 21:56:26 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 4 Jan 2023 21:56:26 GMT Subject: RFR: Enforce that generation sizes align with region sizes In-Reply-To: References: Message-ID: On Wed, 4 Jan 2023 16:12:35 GMT, Kelvin Nilsen wrote: > For correctness, the size of each generation should be a multiple of the region size. > > A recent change violated this requirement. Marked as reviewed by wkemper (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/191 From wkemper at openjdk.org Wed Jan 4 22:29:27 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 4 Jan 2023 22:29:27 GMT Subject: RFR: Allow heuristic trigger to increase capacity instead of running a collection In-Reply-To: References: Message-ID: On Wed, 4 Jan 2023 16:56:54 GMT, Y. Srinivas Ramakrishna wrote: >> Before the adaptive heuristic starts a collection, it will attempt to increase the capacity of its generation. If the capacity is increased, the heuristic will re-evaluate the trigger criteria. >> >> There is also a change here to attempt to increase the size of the old generation in response to a promotion failure. > > The changes look ok. > > I did wonder though if one might (for odd situations) reduce the number of recursions through `should_start_gc()` by having some notion of error that we are trying to correct when we call `resize_and_and_evaluate(/* pass in error or size differential here */)` from `should_start_gc()`. Anyway, just a thought for you to think about. > > Reviewed! @ysramakrishna - changing the capacity of a generation will reset the `_gc_times_learned` field of the heuristics to zero. `resize_and_evaluate` will only resize (and recursive) if `_gc_times_learned` is not less than `ShenandoahLearningSteps`, so it will only really attempt to resize the generation once every `ShenandoahLearningSteps` number of cycles. ------------- PR: https://git.openjdk.org/shenandoah/pull/190 From kdnilsen at openjdk.org Wed Jan 4 23:26:17 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 4 Jan 2023 23:26:17 GMT Subject: Integrated: Enforce that generation sizes align with region sizes In-Reply-To: References: Message-ID: <_TCj5-a9ZmoQwaZDAVrcVZzyYcuvDUisnA5Ksrfejn0=.bc0ac1eb-6ca5-4e9d-857a-9311e65ea550@github.com> On Wed, 4 Jan 2023 16:12:35 GMT, Kelvin Nilsen wrote: > For correctness, the size of each generation should be a multiple of the region size. > > A recent change violated this requirement. This pull request has now been integrated. Changeset: 6daaa75a Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/6daaa75a34857998be5ac4dd53bdf0db289fd3a1 Stats: 5 lines in 2 files changed: 5 ins; 0 del; 0 mod Enforce that generation sizes align with region sizes Reviewed-by: ysr, wkemper ------------- PR: https://git.openjdk.org/shenandoah/pull/191 From kdnilsen at openjdk.org Thu Jan 5 01:01:50 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 01:01:50 GMT Subject: RFR: Fix allocate aligned Message-ID: An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. ------------- Commit messages: - Merge remote-tracking branch 'GitFarmBranch/fix-allocate-aligned-rebase' into fix-allocate-aligned - Remove instrumentation - Force min and max generation sizes to align with region boundaries - Debug verification error in old-gen used - Fix computation of padding requirement - Fix allocate_aligned padding Changes: https://git.openjdk.org/shenandoah/pull/192/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=192&range=00 Stats: 64 lines in 4 files changed: 44 ins; 7 del; 13 mod Patch: https://git.openjdk.org/shenandoah/pull/192.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/192/head:pull/192 PR: https://git.openjdk.org/shenandoah/pull/192 From kdnilsen at openjdk.org Thu Jan 5 01:16:42 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 01:16:42 GMT Subject: RFR: Fix allocate aligned [v2] In-Reply-To: References: Message-ID: > An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. > > In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Refinements during code review ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/192/files - new: https://git.openjdk.org/shenandoah/pull/192/files/e9c981a9..059f3ef5 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=192&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=192&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/192.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/192/head:pull/192 PR: https://git.openjdk.org/shenandoah/pull/192 From kdnilsen at openjdk.org Thu Jan 5 01:16:43 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 01:16:43 GMT Subject: RFR: Fix allocate aligned [v2] In-Reply-To: References: Message-ID: On Thu, 5 Jan 2023 01:12:48 GMT, Kelvin Nilsen wrote: >> An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. >> >> In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Refinements during code review src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 66: > 64: // We don't need to register the PLAB. Its content will be registered as objects are allocated within it and/or > 65: // when the PLAB is retired. > 66: ShenandoahHeap::heap()->card_scan()->register_object(obj); In reviewing my own code, it looks like my implementation contradicts the comment. I'm going to retest without line 66. ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From wkemper at openjdk.org Thu Jan 5 01:24:19 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 5 Jan 2023 01:24:19 GMT Subject: RFR: Fix allocate aligned [v2] In-Reply-To: References: Message-ID: On Thu, 5 Jan 2023 01:16:42 GMT, Kelvin Nilsen wrote: >> An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. >> >> In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Refinements during code review Changes requested by wkemper (Committer). src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 319: > 317: assert(req.actual_size() % CardTable::card_size_in_words() == 0, "PLAB start must align with card boundary"); > 318: assert(((uintptr_t) result) % CardTable::card_size_in_words() == 0, "PLAB start must align with card boundary"); > 319: if (result != nullptr && free > usable_free) { Line 315 asserts that `result` cannot be `nullptr`, do we need to check for non-null again here? ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From kdnilsen at openjdk.org Thu Jan 5 01:41:45 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 01:41:45 GMT Subject: RFR: Fix allocate aligned [v3] In-Reply-To: References: Message-ID: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> > An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. > > In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Remove redundant test for result != nullptr ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/192/files - new: https://git.openjdk.org/shenandoah/pull/192/files/059f3ef5..2b43663d Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=192&range=02 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=192&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/192.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/192/head:pull/192 PR: https://git.openjdk.org/shenandoah/pull/192 From kdnilsen at openjdk.org Thu Jan 5 01:41:48 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 01:41:48 GMT Subject: RFR: Fix allocate aligned [v2] In-Reply-To: References: Message-ID: On Thu, 5 Jan 2023 01:20:59 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Refinements during code review > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 319: > >> 317: assert(req.actual_size() % CardTable::card_size_in_words() == 0, "PLAB start must align with card boundary"); >> 318: assert(((uintptr_t) result) % CardTable::card_size_in_words() == 0, "PLAB start must align with card boundary"); >> 319: if (result != nullptr && free > usable_free) { > > Line 315 asserts that `result` cannot be `nullptr`, do we need to check for non-null again here? Thanks for this catch. Making this change and testing on pipeline before integration. ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From kdnilsen at openjdk.org Thu Jan 5 01:41:49 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 01:41:49 GMT Subject: RFR: Fix allocate aligned [v3] In-Reply-To: References: Message-ID: On Thu, 5 Jan 2023 01:10:08 GMT, Kelvin Nilsen wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove redundant test for result != nullptr > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 66: > >> 64: // We don't need to register the PLAB. Its content will be registered as objects are allocated within it and/or >> 65: // when the PLAB is retired. >> 66: ShenandoahHeap::heap()->card_scan()->register_object(obj); > > In reviewing my own code, it looks like my implementation contradicts the comment. I'm going to retest without line 66. Making this change and testing on regression suite pipeline before integration. ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From ysr at openjdk.org Thu Jan 5 08:14:17 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 5 Jan 2023 08:14:17 GMT Subject: RFR: Fix allocate aligned [v3] In-Reply-To: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> References: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> Message-ID: On Thu, 5 Jan 2023 01:41:45 GMT, Kelvin Nilsen wrote: >> An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. >> >> In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Remove redundant test for result != nullptr Changes look fine. I'd like to understand the original rationale for making PLAB boundaries exactly card-aligned. Perhaps it's described/documented somewhere in the code? (Something to do with simplifying card-scanning concurrently with allocating out of PLABs?) src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 317: > 315: assert(result != nullptr, "Allocation cannot fail"); > 316: assert(r->top() <= r->end(), "Allocation cannot span end of region"); > 317: assert(req.actual_size() % CardTable::card_size_in_words() == 0, "PLAB start must align with card boundary"); "PLAB should be card size multiple" (the next assert checks alignment) ------------- Marked as reviewed by ysr (Author). PR: https://git.openjdk.org/shenandoah/pull/192 From eosterlund at openjdk.org Thu Jan 5 13:08:48 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 5 Jan 2023 13:08:48 GMT Subject: RFR: 8299312: Clean up BarrierSetNMethod In-Reply-To: <1km8O--i4urmIjKeFK2AT3mO0d4DoTX5RcPYC1XdD-k=.d82590fe-35d2-406f-a502-fd5bb2c145f5@github.com> References: <1km8O--i4urmIjKeFK2AT3mO0d4DoTX5RcPYC1XdD-k=.d82590fe-35d2-406f-a502-fd5bb2c145f5@github.com> Message-ID: <_ONOHzwnJtl_l9se_WVzP2nP6dE0EXl61lTOCOt9qFA=.88dcba7c-2ca2-4746-9d55-863cf0635717@github.com> On Tue, 3 Jan 2023 15:55:43 GMT, Martin Doerr wrote: >> The terminology in BarrierSetNMethod is not crisp. In platform code we talk about a per-nmethod "guard value", but on shared level we call the same value arm value or disarm value in different contexts. But it really depends on the value whether the nmethod is disarmed or armed. We should embrace the "guard value" terminology and lift it in to the shared code level. >> We also have more functionality than we need on platform level. The platform level only needs to know how to deoptimize, and how to set/get the guard value of an nmethod. The more specific functionality should be moved to the shared code and be expressed in terms of said setter/getter. > > With https://github.com/openjdk/jdk/commit/245f0cf4ac9dc655bfe2abb1c88c6ed1ddffd291, nmethod entry barriers are implemented on all platforms, now. The ARM32 parts should be added. (Also see failing pre-submit test.) Thanks for the review @TheRealMDoerr! ------------- PR: https://git.openjdk.org/jdk/pull/11774 From kdnilsen at openjdk.org Thu Jan 5 14:19:24 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 14:19:24 GMT Subject: RFR: Fix allocate aligned [v3] In-Reply-To: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> References: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> Message-ID: On Thu, 5 Jan 2023 01:41:45 GMT, Kelvin Nilsen wrote: >> An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. >> >> In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Remove redundant test for result != nullptr After confirming that the two fixes motivated by review do not introduce regressions on our CI pipelines, I will close this with integration. ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From kdnilsen at openjdk.org Thu Jan 5 14:19:25 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 14:19:25 GMT Subject: RFR: Fix allocate aligned [v3] In-Reply-To: References: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> Message-ID: On Thu, 5 Jan 2023 07:59:06 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove redundant test for result != nullptr > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 317: > >> 315: assert(result != nullptr, "Allocation cannot fail"); >> 316: assert(r->top() <= r->end(), "Allocation cannot span end of region"); >> 317: assert(req.actual_size() % CardTable::card_size_in_words() == 0, "PLAB start must align with card boundary"); > > "PLAB should be card size multiple" > > (the next assert checks alignment) This allows us to register objects in PLABs without acquiring a lock. Otherwise, we need a lock because two threads might be registering objects within the same card in parallel. ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From wkemper at openjdk.org Thu Jan 5 16:45:25 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 5 Jan 2023 16:45:25 GMT Subject: RFR: Fix allocate aligned [v3] In-Reply-To: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> References: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> Message-ID: On Thu, 5 Jan 2023 01:41:45 GMT, Kelvin Nilsen wrote: >> An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. >> >> In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Remove redundant test for result != nullptr Marked as reviewed by wkemper (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From kdnilsen at openjdk.org Thu Jan 5 17:04:28 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 17:04:28 GMT Subject: Integrated: Fix allocate aligned In-Reply-To: References: Message-ID: <7-Ephr0bVwLOAptVv7m_Pzv6KZZUcCO4hbsJeuIyhds=.dad9d2c8-9ea3-4edf-bc08-97aafa28c32e@github.com> On Thu, 5 Jan 2023 00:55:14 GMT, Kelvin Nilsen wrote: > An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. > > In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. This pull request has now been integrated. Changeset: 7e9a1d49 Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/7e9a1d49eae7bff80e2b678f2402a4ecdf6c748f Stats: 63 lines in 4 files changed: 43 ins; 7 del; 13 mod Fix allocate aligned Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From wkemper at openjdk.org Thu Jan 5 22:51:22 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 5 Jan 2023 22:51:22 GMT Subject: RFR: Fix use of uninitialized double Message-ID: Member field was not initialized in constructor ------------- Commit messages: - Fix use of uninitialized double Changes: https://git.openjdk.org/shenandoah/pull/194/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=194&range=00 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/194.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/194/head:pull/194 PR: https://git.openjdk.org/shenandoah/pull/194 From kdnilsen at openjdk.org Thu Jan 5 22:58:31 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 22:58:31 GMT Subject: RFR: Fix use of uninitialized double In-Reply-To: References: Message-ID: <73DwV_svxjlyxYhcpnIwpeIWU5ibik3CSmLPbaPabnY=.56e9fe69-cc57-4372-9c97-ffdc7f7763cb@github.com> On Thu, 5 Jan 2023 22:43:38 GMT, William Kemper wrote: > Member field was not initialized in constructor Marked as reviewed by kdnilsen (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/194 From ysr at openjdk.org Thu Jan 5 23:01:33 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 5 Jan 2023 23:01:33 GMT Subject: RFR: Fix use of uninitialized double In-Reply-To: References: Message-ID: <2bCCxKH5djhzBOQjM5X2htQptGPKbfa0G6ek4yW2wLc=.fbd71380-0869-4ed4-9dcd-a860cb10070f@github.com> On Thu, 5 Jan 2023 22:43:38 GMT, William Kemper wrote: > Member field was not initialized in constructor Marked as reviewed by ysr (Author). ------------- PR: https://git.openjdk.org/shenandoah/pull/194 From wkemper at openjdk.org Thu Jan 5 23:05:22 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 5 Jan 2023 23:05:22 GMT Subject: Integrated: Fix use of uninitialized double In-Reply-To: References: Message-ID: On Thu, 5 Jan 2023 22:43:38 GMT, William Kemper wrote: > Member field was not initialized in constructor This pull request has now been integrated. Changeset: cb70d299 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/cb70d299998937138c03a2a7558fe1f6f3cdba0e Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Fix use of uninitialized double Reviewed-by: kdnilsen, ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/194 From sviswanathan at openjdk.org Fri Jan 6 19:48:56 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Jan 2023 19:48:56 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: References: <6lAQI6kDDTGbskylHcWReX8ExaB6qkwgqoai7E6ikZY=.8a69a63c-453d-4bbd-8c76-4d477bfb77fe@github.com> Message-ID: On Thu, 22 Dec 2022 13:10:02 GMT, Claes Redestad wrote: >> @cl4es Thanks for passing the constant node through, the code looks much cleaner now. The attached patch should handle the signed bytes/shorts as well. Please take a look. >> [signed.patch](https://github.com/openjdk/jdk/files/10273480/signed.patch) > > I ran tests and some quick microbenchmarking to validate @sviswa7's patch to activate vectorization for `short` and `byte` arrays and it looks good: > > Before: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 10000 avgt 5 7845.586 ? 23.440 ns/op > ArraysHashCode.chars 10000 avgt 5 1203.163 ? 11.995 ns/op > ArraysHashCode.ints 10000 avgt 5 1131.915 ? 7.843 ns/op > ArraysHashCode.multibytes 10000 avgt 5 4136.487 ? 5.790 ns/op > ArraysHashCode.multichars 10000 avgt 5 671.328 ? 17.629 ns/op > ArraysHashCode.multiints 10000 avgt 5 699.051 ? 8.135 ns/op > ArraysHashCode.multishorts 10000 avgt 5 4139.300 ? 10.633 ns/op > ArraysHashCode.shorts 10000 avgt 5 7844.019 ? 26.071 ns/op > > > After: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 10000 avgt 5 1193.208 ? 1.965 ns/op > ArraysHashCode.chars 10000 avgt 5 1193.311 ? 5.941 ns/op > ArraysHashCode.ints 10000 avgt 5 1132.592 ? 10.410 ns/op > ArraysHashCode.multibytes 10000 avgt 5 657.343 ? 25.343 ns/op > ArraysHashCode.multichars 10000 avgt 5 672.668 ? 5.229 ns/op > ArraysHashCode.multiints 10000 avgt 5 697.143 ? 3.929 ns/op > ArraysHashCode.multishorts 10000 avgt 5 666.738 ? 12.236 ns/op > ArraysHashCode.shorts 10000 avgt 5 1193.563 ? 5.449 ns/op @cl4es There seem to be failure on windows-x64 platform pre submit tests. Could you please take a look? ------------- PR: https://git.openjdk.org/jdk/pull/10847 From kdnilsen at openjdk.org Fri Jan 6 19:56:38 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 6 Jan 2023 19:56:38 GMT Subject: RFR: Plab fallback to minsize Message-ID: If there is not sufficient memory within a region to allocate the desired size of a PLAB, try to allocate a smaller PLAB that is still larger than the minimum PLAB size. This allows more efficient packing of PLABs into available old-gen heap regions. This patch has been exercised by an internal CI regression testing pipeline and has not introduced any regressions. Modest reductions in evacuation efforts are seen on benchmarks that do heavy promotion. This is presumably because there are fewer promotion failures, thus less need to repeatedly copy "old" objects within young-gen spaces. Of note, but not entirely explained, is the observation that some evacuation effort has shifted from mutator threads to GC worker threads. Some regression in jhiccup pause times were observed. This will be explored further after additional patches are integrated. x86 results: +49.22% extremem-phased/jhiccup_max_pause Control: 0.560ms (+/-467.41ms) 104600 Test: 0.835ms (+/-466.68ms) 12291 +28.58% hyperalloc_a2048_o1536/context_switch_count Control: 3945.242 (+/-1228.11 ) 82 Test: 5072.625 (+/-427.15 ) 10 +20.21% extremem-phased/cpu_user Control: 2323.752s (+/-592.18s ) 85 Test: 2793.378s (+/- 74.48s ) 10 +19.23% hyperalloc_a2048_o1536/jhiccup_max_pause Control: 0.613ms (+/- 1.32ms) 9840 Test: 0.731ms (+/- 1.40ms) 1200 +16.37% hyperalloc_a2048_o1536/cpu_user Control: 372.456s (+/- 67.33s ) 82 Test: 433.434s (+/- 10.54s ) 10 +13.72% extremem/worker_objects Control: 228174.968 (+/-274656.55 ) 47085 Test: 259476.534 (+/-300553.34 ) 3012 +12.87% xalan/concurrent_marking Control: 9.055ms (+/- 6.82ms) 7640 Test: 10.221ms (+/- 6.55ms) 561 +11.57% specjbb2015/pause_degenerated_gc_n Control: 921.116ms (+/-627.42ms) 6423 Test: 1.028s (+/-654.58ms) 782 +11.56% specjbb2015/pause_degenerated_gc_g Control: 923.790ms (+/-629.01ms) 6423 Test: 1.031s (+/-656.16ms) 782 -206.91% extremem/mutator_evacuated Control: 128640.328 (+/-1870500.79 ) 47085 Test: 41914.998 (+/-2374222.03 ) 3012 -148.23% hyperalloc_a2048_o1536/mutator_evacuated Control: 1755.404 (+/-3710823.53 ) 18356 Test: 707.164 (+/-3055167.84 ) 3374 -120.26% hyperalloc_a2048_o1536/mutator_objects Control: 6.337 (+/-14949.35 ) 18356 Test: 2.877 (+/-12293.11 ) 3374 -83.34% extremem/mutator_objects Control: 1834.425 (+/-7838.57 ) 47085 Test: 1000.564 (+/-9461.24 ) 3012 -62.39% extremem/concurrent_thread_roots Control: 3.298ms (+/- 5.49ms) 3408 Test: 2.031ms (+/- 4.06ms) 650 -60.85% hyperalloc_a2048_o1536/concurrent_evacuation Control: 11.389ms (+/- 33.55ms) 5943 Test: 7.081ms (+/- 30.34ms) 986 -59.21% hyperalloc_a3072_o1536/concurrent_evacuation Control: 6.744ms (+/- 29.91ms) 9195 Test: 4.236ms (+/- 27.88ms) 1279 -55.74% hyperalloc_a3072_o1536/mutator_evacuated Control: 995.822 (+/-3008122.97 ) 31761 Test: 639.419 (+/-2714463.89 ) 4564 -48.55% hyperalloc_a3072_o1536/mutator_objects Control: 4.018 (+/-12124.35 ) 31761 Test: 2.705 (+/-10918.88 ) 4564 -47.30% extremem/concurrent_mark_roots Control: 3.114ms (+/- 5.37ms) 4065 Test: 2.114ms (+/- 4.21ms) 674 aarch64 results: +20.28% extremem-phased/jhiccup_max_pause Control: 0.391ms (+/-778.03ms) 101414 Test: 0.471ms (+/-506.86ms) 12270 +16.56% xalan/jhiccup_max_pause Control: 2.806ms (+/- 3.57ms) 4943 Test: 3.271ms (+/- 3.60ms) 586 +12.43% hyperalloc_a2048_o1536/cpu_user Control: 387.646s (+/- 62.67s ) 81 Test: 435.835s (+/- 10.43s ) 10 +11.45% specjbb2015/pause_degenerated_gc_n Control: 1.305s (+/-855.27ms) 6504 Test: 1.455s (+/-874.27ms) 840 +11.43% specjbb2015/pause_degenerated_gc_g Control: 1.309s (+/-857.67ms) 6504 Test: 1.459s (+/-876.46ms) 840 +10.42% extremem-phased/cpu_user Control: 3285.986s (+/-822.03s ) 85 Test: 3628.380s (+/- 99.52s ) 10 -206.42% extremem/mutator_evacuated Control: 215404.352 (+/-1220778.32 ) 24514 Test: 70298.169 (+/-909463.44 ) 3011 -156.88% extremem/mutator_objects Control: 4341.422 (+/-15179.61 ) 24514 Test: 1690.048 (+/-8677.87 ) 3011 -117.99% hyperalloc_a2048_o1536/mutator_evacuated Control: 1086.398 (+/-2790294.87 ) 18643 Test: 498.373 (+/-2253567.56 ) 3318 -101.55% hyperalloc_a2048_o1536/mutator_objects Control: 4.146 (+/-11259.63 ) 18643 Test: 2.057 (+/-9112.94 ) 3318 -70.56% hyperalloc_a3072_o1536/concurrent_evacuation Control: 6.078ms (+/- 24.65ms) 8977 Test: 3.563ms (+/- 22.93ms) 1258 -65.62% extremem-phased/reconstruct_remembered_set Control: 190.987ms (+/-136.41ms) 1305 Test: 115.314ms (+/-154.77ms) 96 -56.62% hyperalloc_a3072_o1536/mutator_evacuated Control: 628.225 (+/-2364040.76 ) 31963 Test: 401.114 (+/-2138629.77 ) 4629 -51.32% hyperalloc_a3072_o1536/mutator_objects Control: 2.598 (+/-9519.49 ) 31963 Test: 1.717 (+/-8639.65 ) 4629 -48.41% extremem/concurrent_update_thread_roots Control: 4.723ms (+/- 10.75ms) 5357 Test: 3.182ms (+/- 8.55ms) 658 -48.04% hyperalloc_a2048_o1536/concurrent_evacuation Control: 9.702ms (+/- 27.76ms) 6218 Test: 6.553ms (+/- 24.74ms) 969 ------------- Commit messages: - Remove instrumentation and fix miscalculations in allocate_aligned - Fix bugs when downsizing PLAB allocation request - allocate_aligned tries smaller size if insufficient memory for full size Changes: https://git.openjdk.org/shenandoah/pull/195/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=195&range=00 Stats: 66 lines in 3 files changed: 38 ins; 9 del; 19 mod Patch: https://git.openjdk.org/shenandoah/pull/195.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/195/head:pull/195 PR: https://git.openjdk.org/shenandoah/pull/195 From wkemper at openjdk.org Fri Jan 6 21:13:31 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 6 Jan 2023 21:13:31 GMT Subject: RFR: Plab fallback to minsize In-Reply-To: References: Message-ID: <5Sgt8OMFtOTyAmy6pmSkm-wBSFwJE3spNvnVRM-Qnr4=.06834440-7772-4ccc-9ddb-e020de0c295b@github.com> On Fri, 6 Jan 2023 19:45:14 GMT, Kelvin Nilsen wrote: > If there is not sufficient memory within a region to allocate the desired size of a PLAB, try to allocate a smaller PLAB that is still larger than the minimum PLAB size. This allows more efficient packing of PLABs into available old-gen heap regions. > > This patch has been exercised by an internal CI regression testing pipeline and has not introduced any regressions. > > Modest reductions in evacuation efforts are seen on benchmarks that do heavy promotion. This is presumably because there are fewer promotion failures, thus less need to repeatedly copy "old" objects within young-gen spaces. Of note, but not entirely explained, is the observation that some evacuation effort has shifted from mutator threads to GC worker threads. > > Some regression in jhiccup pause times were observed. This will be explored further after additional patches are integrated. The evacuation metrics between mutators and gc workers are fairly unstable - likely because it depends so much on when and which threads get scheduled. I've been thinking of masking them in the reports for this reason. ------------- PR: https://git.openjdk.org/shenandoah/pull/195 From wkemper at openjdk.org Fri Jan 6 21:29:25 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 6 Jan 2023 21:29:25 GMT Subject: RFR: Plab fallback to minsize In-Reply-To: References: Message-ID: On Fri, 6 Jan 2023 19:45:14 GMT, Kelvin Nilsen wrote: > If there is not sufficient memory within a region to allocate the desired size of a PLAB, try to allocate a smaller PLAB that is still larger than the minimum PLAB size. This allows more efficient packing of PLABs into available old-gen heap regions. > > This patch has been exercised by an internal CI regression testing pipeline and has not introduced any regressions. > > Modest reductions in evacuation efforts are seen on benchmarks that do heavy promotion. This is presumably because there are fewer promotion failures, thus less need to repeatedly copy "old" objects within young-gen spaces. Of note, but not entirely explained, is the observation that some evacuation effort has shifted from mutator threads to GC worker threads. > > Some regression in jhiccup pause times were observed. This will be explored further after additional patches are integrated. Marked as reviewed by wkemper (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/195 From ysr at openjdk.org Fri Jan 6 21:29:25 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 6 Jan 2023 21:29:25 GMT Subject: RFR: Plab fallback to minsize In-Reply-To: References: Message-ID: <4yM1KvsDrl2tZq_vsc4Yu64lidm2Tuxf1wWOBrkYfhY=.2657559b-6fa7-4ba4-abd9-eddfff4ec546@github.com> On Fri, 6 Jan 2023 19:45:14 GMT, Kelvin Nilsen wrote: > If there is not sufficient memory within a region to allocate the desired size of a PLAB, try to allocate a smaller PLAB that is still larger than the minimum PLAB size. This allows more efficient packing of PLABs into available old-gen heap regions. > > This patch has been exercised by an internal CI regression testing pipeline and has not introduced any regressions. > > Modest reductions in evacuation efforts are seen on benchmarks that do heavy promotion. This is presumably because there are fewer promotion failures, thus less need to repeatedly copy "old" objects within young-gen spaces. Of note, but not entirely explained, is the observation that some evacuation effort has shifted from mutator threads to GC worker threads. > > Some regression in jhiccup pause times were observed. This will be explored further after additional patches are integrated. Changes look good, modulo general comments for longer-term consideration. Generally looks good, but I wonder if one should more carefully define the pre- and post-conditions of the two allocate methods to avoid duplicated computation between them (especially wrt minimum size etc.) One way to achieve that would be to have more specialized allocate methods that are called by subsets of clients. Having a leaf method called by several could lead to such duplication of checks. e.g. I see a bunch of "result != null" for values returned from a method that does checks and trimming of its own. If so, the checks in the leaf method for that caller may be wasteful. This is a general comment, but I'll look more carefully at the code to understand this better. One question: do PLAB requests that give you smaller PLABs slow down all subsequent PLAB requests in that region? Does this then result in a donwsizing of PLAB requests in the same cycle or subsequent ones? (I guess I am asking how often / what cycle PLAB resizing happens.) ------------- Marked as reviewed by ysr (Author). PR: https://git.openjdk.org/shenandoah/pull/195 From kdnilsen at openjdk.org Fri Jan 6 21:40:18 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 6 Jan 2023 21:40:18 GMT Subject: RFR: Plab fallback to minsize In-Reply-To: References: Message-ID: On Fri, 6 Jan 2023 19:45:14 GMT, Kelvin Nilsen wrote: > If there is not sufficient memory within a region to allocate the desired size of a PLAB, try to allocate a smaller PLAB that is still larger than the minimum PLAB size. This allows more efficient packing of PLABs into available old-gen heap regions. > > This patch has been exercised by an internal CI regression testing pipeline and has not introduced any regressions. > > Modest reductions in evacuation efforts are seen on benchmarks that do heavy promotion. This is presumably because there are fewer promotion failures, thus less need to repeatedly copy "old" objects within young-gen spaces. Of note, but not entirely explained, is the observation that some evacuation effort has shifted from mutator threads to GC worker threads. > > Some regression in jhiccup pause times were observed. This will be explored further after additional patches are integrated. The code for sizing PLAB requests is in allocate_from_plab_slow(). In general, each thread starts out with PLABs of size PLAB::min_size(). Each time the thread exhausts its existing PLAB, it tries to allocate a new PLAB that's twice as large as its previously preferred PLAB size, even if its previous PLAB is smaller than its previously preferred PLAB size. The consequence of downsizing a particular PLAB is that the thread will end up depleting the downsized PLAB more quickly than normal, which will result in this thread subsequently receiving an even larger PLAB sooner. ------------- PR: https://git.openjdk.org/shenandoah/pull/195 From kdnilsen at openjdk.org Fri Jan 6 21:46:32 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 6 Jan 2023 21:46:32 GMT Subject: RFR: Plab fallback to minsize In-Reply-To: References: Message-ID: On Fri, 6 Jan 2023 19:45:14 GMT, Kelvin Nilsen wrote: > If there is not sufficient memory within a region to allocate the desired size of a PLAB, try to allocate a smaller PLAB that is still larger than the minimum PLAB size. This allows more efficient packing of PLABs into available old-gen heap regions. > > This patch has been exercised by an internal CI regression testing pipeline and has not introduced any regressions. > > Modest reductions in evacuation efforts are seen on benchmarks that do heavy promotion. This is presumably because there are fewer promotion failures, thus less need to repeatedly copy "old" objects within young-gen spaces. Of note, but not entirely explained, is the observation that some evacuation effort has shifted from mutator threads to GC worker threads. > > Some regression in jhiccup pause times were observed. This will be explored further after additional patches are integrated. You make a good observation about the redundant checking between the caller and callee functions here. I agree that it would be good to eventually tighten up the API specs so that we don't need this redundancy. In the meanwhile, I note that allocate_aligned() is generally in-lined into the caller's context. This allows the compiler to optimize away at least some of the redundant checks. ------------- PR: https://git.openjdk.org/shenandoah/pull/195 From kdnilsen at openjdk.org Fri Jan 6 21:46:32 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 6 Jan 2023 21:46:32 GMT Subject: Integrated: Plab fallback to minsize In-Reply-To: References: Message-ID: <7OKSCFQrZqCDERPfO7t6Y85oCVmEpUC_8a9ZSrikcjg=.8b141e13-7815-4cf8-9d1d-e1d4035d341b@github.com> On Fri, 6 Jan 2023 19:45:14 GMT, Kelvin Nilsen wrote: > If there is not sufficient memory within a region to allocate the desired size of a PLAB, try to allocate a smaller PLAB that is still larger than the minimum PLAB size. This allows more efficient packing of PLABs into available old-gen heap regions. > > This patch has been exercised by an internal CI regression testing pipeline and has not introduced any regressions. > > Modest reductions in evacuation efforts are seen on benchmarks that do heavy promotion. This is presumably because there are fewer promotion failures, thus less need to repeatedly copy "old" objects within young-gen spaces. Of note, but not entirely explained, is the observation that some evacuation effort has shifted from mutator threads to GC worker threads. > > Some regression in jhiccup pause times were observed. This will be explored further after additional patches are integrated. This pull request has now been integrated. Changeset: c5774a09 Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/c5774a09b35a01c5ab52831a63072d0b753afd64 Stats: 66 lines in 3 files changed: 38 ins; 9 del; 19 mod Plab fallback to minsize Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/195 From wkemper at openjdk.org Sat Jan 7 00:22:41 2023 From: wkemper at openjdk.org (William Kemper) Date: Sat, 7 Jan 2023 00:22:41 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: <4-PjqKekTb5CuO6T_TRabJOAnG172XlsQqjsRSL16Rw=.811f7125-57ce-4551-94cf-945e2e3d939a@github.com> Merges tag jdk-21+4 ------------- Commit messages: - Merge tag 'jdk-21+4' into merge-jdk-21-4 - 8299439: java/text/Format/NumberFormat/CurrencyFormat.java fails for hr_HR - 8299563: Fix typos - 8219810: javac throws NullPointerException - 8200610: Compiling fails with java.nio.file.ReadOnlyFileSystemException - Merge - 8299476: PPC64 Zero build fails after JDK-8286302 - 8293824: gc/whitebox/TestConcMarkCycleWB.java failed "RuntimeException: assertTrue: expected true, was false" - 8299483: ProblemList java/text/Format/NumberFormat/CurrencyFormat.java - 8298324: Unable to run shell test with make - ... and 72 more: https://git.openjdk.org/shenandoah/compare/c5774a09...55fe3430 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=shenandoah&pr=196&range=00.0 - openjdk/jdk:master: https://webrevs.openjdk.org/?repo=shenandoah&pr=196&range=00.1 Changes: https://git.openjdk.org/shenandoah/pull/196/files Stats: 5313 lines in 488 files changed: 2252 ins; 1955 del; 1106 mod Patch: https://git.openjdk.org/shenandoah/pull/196.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/196/head:pull/196 PR: https://git.openjdk.org/shenandoah/pull/196 From fyang at openjdk.org Sat Jan 7 10:11:53 2023 From: fyang at openjdk.org (Fei Yang) Date: Sat, 7 Jan 2023 10:11:53 GMT Subject: RFR: 8299312: Clean up BarrierSetNMethod [v2] In-Reply-To: References: Message-ID: On Wed, 4 Jan 2023 14:50:20 GMT, Erik ?sterlund wrote: >> The terminology in BarrierSetNMethod is not crisp. In platform code we talk about a per-nmethod "guard value", but on shared level we call the same value arm value or disarm value in different contexts. But it really depends on the value whether the nmethod is disarmed or armed. We should embrace the "guard value" terminology and lift it in to the shared code level. >> We also have more functionality than we need on platform level. The platform level only needs to know how to deoptimize, and how to set/get the guard value of an nmethod. The more specific functionality should be moved to the shared code and be expressed in terms of said setter/getter. > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - ARM support > - Merge branch 'master' into 8299312_barrier_set_nmethod_cleanup > - Fix Shenandoah build > - 8299312: Clean up BarrierSetNMethod Looks good to me. src/hotspot/share/runtime/thread.hpp line 118: > 116: // On AArch64, the high order 32 bits are used by a "patching epoch" number > 117: // which reflects if this thread has executed the required fences, after > 118: // an nmethod gets disarmed. The low order 32 bit denote the disarmed value. Nit: I think this should be: "The low order 32 bits denote the disarmed value." instead of: "The low order 32 bit denote the disarmed value." ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/11774 From eosterlund at openjdk.org Mon Jan 9 09:49:55 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Jan 2023 09:49:55 GMT Subject: RFR: 8299312: Clean up BarrierSetNMethod [v2] In-Reply-To: References: Message-ID: On Sat, 7 Jan 2023 10:08:36 GMT, Fei Yang wrote: >> Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - ARM support >> - Merge branch 'master' into 8299312_barrier_set_nmethod_cleanup >> - Fix Shenandoah build >> - 8299312: Clean up BarrierSetNMethod > > Looks good to me. Thanks for the review @RealFYang! > src/hotspot/share/runtime/thread.hpp line 118: > >> 116: // On AArch64, the high order 32 bits are used by a "patching epoch" number >> 117: // which reflects if this thread has executed the required fences, after >> 118: // an nmethod gets disarmed. The low order 32 bit denote the disarmed value. > > Nit: > I think this should be: > "The low order 32 bits denote the disarmed value." > instead of: > "The low order 32 bit denote the disarmed value." Yes, you are right, thanks. ------------- PR: https://git.openjdk.org/jdk/pull/11774 From eosterlund at openjdk.org Mon Jan 9 09:54:12 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Jan 2023 09:54:12 GMT Subject: RFR: 8299312: Clean up BarrierSetNMethod [v3] In-Reply-To: References: Message-ID: > The terminology in BarrierSetNMethod is not crisp. In platform code we talk about a per-nmethod "guard value", but on shared level we call the same value arm value or disarm value in different contexts. But it really depends on the value whether the nmethod is disarmed or armed. We should embrace the "guard value" terminology and lift it in to the shared code level. > We also have more functionality than we need on platform level. The platform level only needs to know how to deoptimize, and how to set/get the guard value of an nmethod. The more specific functionality should be moved to the shared code and be expressed in terms of said setter/getter. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Nit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11774/files - new: https://git.openjdk.org/jdk/pull/11774/files/e0b32db3..08a1fb25 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11774&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11774&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11774.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11774/head:pull/11774 PR: https://git.openjdk.org/jdk/pull/11774 From eosterlund at openjdk.org Mon Jan 9 13:38:00 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Jan 2023 13:38:00 GMT Subject: Integrated: 8299312: Clean up BarrierSetNMethod In-Reply-To: References: Message-ID: On Fri, 23 Dec 2022 12:00:46 GMT, Erik ?sterlund wrote: > The terminology in BarrierSetNMethod is not crisp. In platform code we talk about a per-nmethod "guard value", but on shared level we call the same value arm value or disarm value in different contexts. But it really depends on the value whether the nmethod is disarmed or armed. We should embrace the "guard value" terminology and lift it in to the shared code level. > We also have more functionality than we need on platform level. The platform level only needs to know how to deoptimize, and how to set/get the guard value of an nmethod. The more specific functionality should be moved to the shared code and be expressed in terms of said setter/getter. This pull request has now been integrated. Changeset: 4ba81221 Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/4ba8122197e912db4894ed7fe395a8842268fbef Stats: 175 lines in 29 files changed: 10 ins; 82 del; 83 mod 8299312: Clean up BarrierSetNMethod Reviewed-by: mdoerr, fyang ------------- PR: https://git.openjdk.org/jdk/pull/11774 From redestad at openjdk.org Mon Jan 9 15:23:58 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 9 Jan 2023 15:23:58 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: References: <6lAQI6kDDTGbskylHcWReX8ExaB6qkwgqoai7E6ikZY=.8a69a63c-453d-4bbd-8c76-4d477bfb77fe@github.com> Message-ID: On Thu, 22 Dec 2022 13:10:02 GMT, Claes Redestad wrote: >> @cl4es Thanks for passing the constant node through, the code looks much cleaner now. The attached patch should handle the signed bytes/shorts as well. Please take a look. >> [signed.patch](https://github.com/openjdk/jdk/files/10273480/signed.patch) > > I ran tests and some quick microbenchmarking to validate @sviswa7's patch to activate vectorization for `short` and `byte` arrays and it looks good: > > Before: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 10000 avgt 5 7845.586 ? 23.440 ns/op > ArraysHashCode.chars 10000 avgt 5 1203.163 ? 11.995 ns/op > ArraysHashCode.ints 10000 avgt 5 1131.915 ? 7.843 ns/op > ArraysHashCode.multibytes 10000 avgt 5 4136.487 ? 5.790 ns/op > ArraysHashCode.multichars 10000 avgt 5 671.328 ? 17.629 ns/op > ArraysHashCode.multiints 10000 avgt 5 699.051 ? 8.135 ns/op > ArraysHashCode.multishorts 10000 avgt 5 4139.300 ? 10.633 ns/op > ArraysHashCode.shorts 10000 avgt 5 7844.019 ? 26.071 ns/op > > > After: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 10000 avgt 5 1193.208 ? 1.965 ns/op > ArraysHashCode.chars 10000 avgt 5 1193.311 ? 5.941 ns/op > ArraysHashCode.ints 10000 avgt 5 1132.592 ? 10.410 ns/op > ArraysHashCode.multibytes 10000 avgt 5 657.343 ? 25.343 ns/op > ArraysHashCode.multichars 10000 avgt 5 672.668 ? 5.229 ns/op > ArraysHashCode.multiints 10000 avgt 5 697.143 ? 3.929 ns/op > ArraysHashCode.multishorts 10000 avgt 5 666.738 ? 12.236 ns/op > ArraysHashCode.shorts 10000 avgt 5 1193.563 ? 5.449 ns/op > @cl4es There seem to be failure on windows-x64 platform pre submit tests. Could you please take a look? It looks like the `as_Address(ExternalAddress(StubRoutines::x86::arrays_hashcode_powers_of_31() + ...)` trick is running into some reachability issue on Windows, hitting the `assert(reachable(adr), "must be");` in `macroAssembler_x86.cpp`. Might be related to ASLR or some quirk of the VS compiler. I'll investigate. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Jan 9 15:00:48 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 9 Jan 2023 15:00:48 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v17] In-Reply-To: References: Message-ID: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 68 commits: - Merge branch 'master' into 8282664-polyhash - Treat Op_VectorizedHashCode as other similar Ops in split_unique_types - Handle signed subword arrays, contributed by @sviswa7 - @sviswa7 comments - Pass the constant mode node through, removing need for all but one instruct declarations - FLAG_SET_DEFAULT - Merge branch 'master' into 8282664-polyhash - Merge branch 'master' into 8282664-polyhash - Missing & 0xff in StringLatin1::hashCode - Qualified guess on shenandoahSupport fix-up - ... and 58 more: https://git.openjdk.org/jdk/compare/66db0bb6...71297615 ------------- Changes: https://git.openjdk.org/jdk/pull/10847/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=16 Stats: 1052 lines in 33 files changed: 992 ins; 8 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Jan 9 16:49:25 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 9 Jan 2023 16:49:25 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18] In-Reply-To: References: Message-ID: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Explicitly lea external address ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/71297615..c8c58f4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=16-17 Stats: 11 lines in 1 file changed: 6 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From wkemper at openjdk.org Mon Jan 9 17:41:37 2023 From: wkemper at openjdk.org (William Kemper) Date: Mon, 9 Jan 2023 17:41:37 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: <4-PjqKekTb5CuO6T_TRabJOAnG172XlsQqjsRSL16Rw=.811f7125-57ce-4551-94cf-945e2e3d939a@github.com> References: <4-PjqKekTb5CuO6T_TRabJOAnG172XlsQqjsRSL16Rw=.811f7125-57ce-4551-94cf-945e2e3d939a@github.com> Message-ID: On Sat, 7 Jan 2023 00:13:06 GMT, William Kemper wrote: > Merges tag jdk-21+4 This pull request has now been integrated. Changeset: bbd39940 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/bbd3994084271e4c2bca41987f9f6ab644bc754f Stats: 5313 lines in 488 files changed: 2252 ins; 1955 del; 1106 mod Merge openjdk/jdk:master ------------- PR: https://git.openjdk.org/shenandoah/pull/196 From kdnilsen at openjdk.org Mon Jan 9 22:18:26 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 9 Jan 2023 22:18:26 GMT Subject: RFR: Fix verification of remembered set at mark start Message-ID: All objects residing between TAMS and top() within each old region are examined independent of the marking context. ------------- Commit messages: - Fix verification of remembered set at mark start Changes: https://git.openjdk.org/shenandoah/pull/197/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=197&range=00 Stats: 46 lines in 1 file changed: 29 ins; 7 del; 10 mod Patch: https://git.openjdk.org/shenandoah/pull/197.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/197/head:pull/197 PR: https://git.openjdk.org/shenandoah/pull/197 From redestad at openjdk.org Mon Jan 9 23:17:00 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 9 Jan 2023 23:17:00 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18] In-Reply-To: References: Message-ID: On Mon, 9 Jan 2023 16:49:25 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly lea external address Explicitly loading the address to a register seems to do the trick, avoiding the pitfalls of `as_Address(AddressLiteral)` - which apparently only works (portably) when we know for certain the address is in some allowed range. There's no measurable difference on microbenchmarks (there might be a couple of extra lea instructions on the vectorized paths, but that disappears in the noise). Thanks @fisk for the suggestion! ------------- PR: https://git.openjdk.org/jdk/pull/10847 From sviswanathan at openjdk.org Tue Jan 10 00:28:58 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 10 Jan 2023 00:28:58 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18] In-Reply-To: References: Message-ID: On Mon, 9 Jan 2023 23:13:29 GMT, Claes Redestad wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Explicitly lea external address > > Explicitly loading the address to a register seems to do the trick, avoiding the pitfalls of `as_Address(AddressLiteral)` - which apparently only works (portably) when we know for certain the address is in some allowed range. There's no measurable difference on microbenchmarks (there might be a couple of extra lea instructions on the vectorized paths, but that disappears in the noise). Thanks @fisk for the suggestion! Thanks @cl4es for fixing this issue. Changes look good to me. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From andrew at openjdk.org Tue Jan 10 01:50:25 2023 From: andrew at openjdk.org (Andrew John Hughes) Date: Tue, 10 Jan 2023 01:50:25 GMT Subject: git: openjdk/shenandoah-jdk8u: Added tag jdk8u332-b03 for changeset 12528bb4 Message-ID: <1273a435-5368-434f-bf7f-bfb8cabf183b@openjdk.org> Tagged by: Andrew John Hughes Date: 2022-02-23 01:58:09 +0000 Changeset: 12528bb4 Author: Sergey Bylokhov Date: 2022-02-16 21:06:29 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/12528bb4d331ed2ec9630db0ee3f2bfeea44b632 From andrew at openjdk.org Tue Jan 10 01:50:29 2023 From: andrew at openjdk.org (Andrew John Hughes) Date: Tue, 10 Jan 2023 01:50:29 GMT Subject: git: openjdk/shenandoah-jdk8u: Added tag shenandoah8u332-b03 for changeset 207cbfb2 Message-ID: <9e09187e-c3c9-4ac3-a52a-71bf5226d025@openjdk.org> Tagged by: Andrew John Hughes Date: 2023-01-10 01:48:05 +0000 Added tag shenandoah8u332-b03 for changeset 207cbfb2fce Changeset: 207cbfb2 Author: Andrew John Hughes Date: 2022-12-16 00:23:54 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/207cbfb2fce33f98095a9144546dfb8e2007483b From andrew at openjdk.org Tue Jan 10 01:50:51 2023 From: andrew at openjdk.org (Andrew John Hughes) Date: Tue, 10 Jan 2023 01:50:51 GMT Subject: git: openjdk/shenandoah-jdk8u: master: 5 new changesets Message-ID: Changeset: 26e70339 Author: Andrew John Hughes Date: 2022-02-08 16:47:38 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/26e70339509fc180a34714329a5e2e7c3750dbb5 Added tag jdk8u332-b02 for changeset 4eff168ecdd9 ! .hgtags Changeset: 054b85b1 Author: Erik Joelsson Date: 2018-09-07 14:54:15 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/054b85b1f65254b2d3d2a1d343e14d8eabd1af40 8210283: Support git as an SCM alternative in the build Removes forest handling of SCM ids Reviewed-by: andrew + .gitignore ! common/autoconf/basics.m4 ! common/autoconf/generated-configure.sh ! common/autoconf/spec.gmk.in ! make/common/MakeBase.gmk Changeset: 53bb5f63 Author: David Li Committer: David Li Date: 2014-04-15 10:36:23 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/53bb5f635cbf5eb46f687e275a4343862bdfc8db 8037259: xerces update: xpointer update Reviewed-by: lancea, phh ! jaxp/src/com/sun/org/apache/xerces/internal/xpointer/ElementSchemePointer.java ! jaxp/src/com/sun/org/apache/xerces/internal/xpointer/ShortHandPointer.java ! jaxp/src/com/sun/org/apache/xerces/internal/xpointer/XPointerErrorHandler.java ! jaxp/src/com/sun/org/apache/xerces/internal/xpointer/XPointerHandler.java ! jaxp/src/com/sun/org/apache/xerces/internal/xpointer/XPointerMessageFormatter.java ! jaxp/src/com/sun/org/apache/xerces/internal/xpointer/XPointerPart.java ! jaxp/src/com/sun/org/apache/xerces/internal/xpointer/XPointerProcessor.java Changeset: 12528bb4 Author: Sergey Bylokhov Date: 2022-02-16 21:06:29 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/12528bb4d331ed2ec9630db0ee3f2bfeea44b632 8280060: The sun/rmi/server/Activation.java class use Thread.dumpStack() Reviewed-by: phh ! jdk/src/share/classes/sun/rmi/server/Activation.java Changeset: 207cbfb2 Author: Andrew John Hughes Date: 2022-12-16 00:23:54 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/207cbfb2fce33f98095a9144546dfb8e2007483b Merge jdk8u332-b03 From andrew at openjdk.org Tue Jan 10 01:52:50 2023 From: andrew at openjdk.org (Andrew John Hughes) Date: Tue, 10 Jan 2023 01:52:50 GMT Subject: RFR: Merge jdk8u:master [v2] In-Reply-To: <7AiNiAh1EJtQcsRIhBIiBESeY4i2w22rAotd7dky4-g=.0c73f373-3815-49c6-a2ae-9fda87352e53@github.com> References: <7AiNiAh1EJtQcsRIhBIiBESeY4i2w22rAotd7dky4-g=.0c73f373-3815-49c6-a2ae-9fda87352e53@github.com> Message-ID: > Mere jdk8u332-b03 Andrew John Hughes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/shenandoah-jdk8u/pull/8/files - new: https://git.openjdk.org/shenandoah-jdk8u/pull/8/files/207cbfb2..207cbfb2 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah-jdk8u&pr=8&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah-jdk8u&pr=8&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah-jdk8u/pull/8.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk8u pull/8/head:pull/8 PR: https://git.openjdk.org/shenandoah-jdk8u/pull/8 From iris at openjdk.org Tue Jan 10 01:52:51 2023 From: iris at openjdk.org (Iris Clark) Date: Tue, 10 Jan 2023 01:52:51 GMT Subject: Withdrawn: Merge jdk8u:master In-Reply-To: <7AiNiAh1EJtQcsRIhBIiBESeY4i2w22rAotd7dky4-g=.0c73f373-3815-49c6-a2ae-9fda87352e53@github.com> References: <7AiNiAh1EJtQcsRIhBIiBESeY4i2w22rAotd7dky4-g=.0c73f373-3815-49c6-a2ae-9fda87352e53@github.com> Message-ID: On Fri, 16 Dec 2022 00:32:06 GMT, Andrew John Hughes wrote: > Mere jdk8u332-b03 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/shenandoah-jdk8u/pull/8 From eosterlund at openjdk.org Tue Jan 10 10:12:51 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 10 Jan 2023 10:12:51 GMT Subject: RFR: 8299673: Simplify object pinning interactions with string deduplication Message-ID: When raw char* String contents are exposed to JNI code, we 1. Load the string.value and pin it 2. Run native code 3. Load the string.value and unpin it Given this sequence we would be in trouble if between 1 and 3, string deduplication changed the value object. Then the pinning and unpinning wouldn't be balanced. The current approach for dealing with this is to have a bunch of code to guard against deduplication. An alternative simpler solution is to just change step 3 to pass in the same value. We already have enough information available to do that. Then the pinning and unpinning is also balanced, and we don't need to have any special interactions with string deduplication and can decouple these orthogonal concerns. It's worth noting though that the contract of pin_object now makes it explicit that pinned objects must not be recycled, even if not otherwise reachable. That seems to come naturally for region based pinning, but is worth keeping in mind. The exposed char* might be the only thing referencing the string value when string dedup happens concurrently. ------------- Commit messages: - More Kim feedback - Feedback from Kim - 8299673: Simplify object pinning interactions with string deduplication Changes: https://git.openjdk.org/jdk/pull/11923/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11923&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8299673 Stats: 162 lines in 14 files changed: 66 ins; 68 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/11923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11923/head:pull/11923 PR: https://git.openjdk.org/jdk/pull/11923 From kdnilsen at openjdk.org Wed Jan 11 01:39:28 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 11 Jan 2023 01:39:28 GMT Subject: RFR: Fix verification of remembered set at mark start [v2] In-Reply-To: References: Message-ID: > All objects residing between TAMS and top() within each old region are examined independent of the marking context. Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Simplify rem-set verification code at init mark The code as originally written was mostly correct. Use that implementation with just a few refinements to properly handle promotions that occur during concurrent old-gen marking. - Simplify the fix to rem-set verifier Just remove the offending assert(). The code as originally written should work ok. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/197/files - new: https://git.openjdk.org/shenandoah/pull/197/files/e30a9aac..7a659985 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=197&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=197&range=00-01 Stats: 47 lines in 1 file changed: 8 ins; 29 del; 10 mod Patch: https://git.openjdk.org/shenandoah/pull/197.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/197/head:pull/197 PR: https://git.openjdk.org/shenandoah/pull/197 From kbarrett at openjdk.org Wed Jan 11 04:46:10 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 11 Jan 2023 04:46:10 GMT Subject: RFR: 8299673: Simplify object pinning interactions with string deduplication In-Reply-To: References: Message-ID: On Tue, 10 Jan 2023 10:04:48 GMT, Erik ?sterlund wrote: > When raw char* String contents are exposed to JNI code, we > > 1. Load the string.value and pin it > 2. Run native code > 3. Load the string.value and unpin it > > Given this sequence we would be in trouble if between 1 and 3, string deduplication changed the value object. Then the pinning and unpinning wouldn't be balanced. > > The current approach for dealing with this is to have a bunch of code to guard against deduplication. An alternative simpler solution is to just change step 3 to pass in the same value. We already have enough information available to do that. Then the pinning and unpinning is also balanced, and we don't need to have any special interactions with string deduplication and can decouple these orthogonal concerns. > > It's worth noting though that the contract of pin_object now makes it explicit that pinned objects must not be recycled, even if not otherwise reachable. That seems to come naturally for region based pinning, but is worth keeping in mind. The exposed char* might be the only thing referencing the string value when string dedup happens concurrently. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/11923 From stefank at openjdk.org Wed Jan 11 09:25:15 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 11 Jan 2023 09:25:15 GMT Subject: RFR: 8299673: Simplify object pinning interactions with string deduplication In-Reply-To: References: Message-ID: On Tue, 10 Jan 2023 10:04:48 GMT, Erik ?sterlund wrote: > When raw char* String contents are exposed to JNI code, we > > 1. Load the string.value and pin it > 2. Run native code > 3. Load the string.value and unpin it > > Given this sequence we would be in trouble if between 1 and 3, string deduplication changed the value object. Then the pinning and unpinning wouldn't be balanced. > > The current approach for dealing with this is to have a bunch of code to guard against deduplication. An alternative simpler solution is to just change step 3 to pass in the same value. We already have enough information available to do that. Then the pinning and unpinning is also balanced, and we don't need to have any special interactions with string deduplication and can decouple these orthogonal concerns. > > It's worth noting though that the contract of pin_object now makes it explicit that pinned objects must not be recycled, even if not otherwise reachable. That seems to come naturally for region based pinning, but is worth keeping in mind. The exposed char* might be the only thing referencing the string value when string dedup happens concurrently. Marked as reviewed by stefank (Reviewer). src/hotspot/share/gc/z/zCollectedHeap.cpp line 27: > 25: #include "classfile/classLoaderData.hpp" > 26: #include "gc/shared/gcLocker.inline.hpp" > 27: #include "gc/shared/gcHeapSummary.hpp" Sort order ------------- PR: https://git.openjdk.org/jdk/pull/11923 From redestad at openjdk.org Wed Jan 11 12:19:21 2023 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 11 Jan 2023 12:19:21 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18] In-Reply-To: References: Message-ID: <3XcxGKxGGuk9z2Zz5qx32DcWsv5edlNMISuEw0lVawE=.fdc71f3d-ddee-485b-b6b5-c56ef6380368@github.com> On Mon, 9 Jan 2023 16:49:25 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly lea external address I'll do another round of internal testing (tier1-4). Unless I hear any objections I plan to integrate this once all testing looks satisfactory. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From kdnilsen at openjdk.org Wed Jan 11 15:03:06 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 11 Jan 2023 15:03:06 GMT Subject: RFR: Fix verification of remembered set at mark start [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jan 2023 01:39:28 GMT, Kelvin Nilsen wrote: >> All objects residing between TAMS and top() within each old region are examined independent of the marking context. > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - Simplify rem-set verification code at init mark > > The code as originally written was mostly correct. Use that > implementation with just a few refinements to properly handle promotions > that occur during concurrent old-gen marking. > - Simplify the fix to rem-set verifier > > Just remove the offending assert(). The code as originally written > should work ok. This version of the code has passed our internal pipeline regression tests. ------------- PR: https://git.openjdk.org/shenandoah/pull/197 From wkemper at openjdk.org Wed Jan 11 16:13:05 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 11 Jan 2023 16:13:05 GMT Subject: RFR: Fix verification of remembered set at mark start [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jan 2023 01:39:28 GMT, Kelvin Nilsen wrote: >> All objects residing between TAMS and top() within each old region are examined independent of the marking context. > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - Simplify rem-set verification code at init mark > > The code as originally written was mostly correct. Use that > implementation with just a few refinements to properly handle promotions > that occur during concurrent old-gen marking. > - Simplify the fix to rem-set verifier > > Just remove the offending assert(). The code as originally written > should work ok. Marked as reviewed by wkemper (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/197 From ysr at openjdk.org Wed Jan 11 16:40:58 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 11 Jan 2023 16:40:58 GMT Subject: RFR: Fix verification of remembered set at mark start [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jan 2023 01:39:28 GMT, Kelvin Nilsen wrote: >> All objects residing between TAMS and top() within each old region are examined independent of the marking context. > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - Simplify rem-set verification code at init mark > > The code as originally written was mostly correct. Use that > implementation with just a few refinements to properly handle promotions > that occur during concurrent old-gen marking. > - Simplify the fix to rem-set verifier > > Just remove the offending assert(). The code as originally written > should work ok. Marked as reviewed by ysr (Author). ------------- PR: https://git.openjdk.org/shenandoah/pull/197 From kdnilsen at openjdk.org Wed Jan 11 16:44:50 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 11 Jan 2023 16:44:50 GMT Subject: Integrated: Fix verification of remembered set at mark start In-Reply-To: References: Message-ID: On Mon, 9 Jan 2023 22:12:08 GMT, Kelvin Nilsen wrote: > All objects residing between TAMS and top() within each old region are examined independent of the marking context. This pull request has now been integrated. Changeset: aca12fcb Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/aca12fcb017524fc3107aa65e8f1566fc2e044fa Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Fix verification of remembered set at mark start Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/197 From kdnilsen at openjdk.org Wed Jan 11 16:48:19 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 11 Jan 2023 16:48:19 GMT Subject: RFR: Broaden plab region search Message-ID: Recent testing revealed that old-gen heap regions were being ignored in the search to satisfy new PLAB allocation requests. It was discovered that many of these regions are found within ranges of the ShenandoahFreeSet that are not considered to be "is_collector_free()". This is a first step in a two-step change. In this patch, we broaden the search for old-gen heap regions that have available memory to include regions that are not is_collector_free. In a second step of this improvement, we intend to restructure the implementation of the ShenandoahFreeSet to better distinguish ranges of regions that hold young-gen survivors, which ranges hold old-gen, and which ranges are intended to serve as mutator allocations. On an Extremem workload that allocates roughly 626 M/s in a total heap size of 49G with an old-gen usage of 18.7G, we saw significant improvements compared to mainline generational Shenandoah implementation: 1. Concurrent GC passes decreased from 606 to 493 (19% improvement) 2. Degenerated GC passes decreased from 17 to 3 (82% improvement) 3. Full GCs decreased from 15 to 3 (80% improvement) 4. P50 latency for Customer Preparation Processing (CPP) improved from 1798 us to 1735 us (3.5%) 5. P100 latency for CPP improved from 25_636_285 us to 9_148_580 us (64% improvement) Across a broad assortment of performance related CI tests, we also benefits on x86: -74.24% extremem-phased/do_nothing_p99 p=0.00061 Control: 2.318s (+/-941.25ms) 80 Test: 1.330s (+/- 1.01s ) 15 -15.70% extremem-phased/context_switch_count p=0.02032 Control: 28188.234 (+/-5868.23 ) 80 Test: 24362.538 (+/-4260.19 ) 15 -6.26% extremem-phased/do_nothing_p50 p=0.00246 Control: 603.203us (+/- 38.32us) 80 Test: 567.692us (+/- 50.34us) 15 And on aarch64: +22.92% specjbb2015/sla_10000_jops p=0.01104 Control: 2607.153 (+/-799.74 ) 90 Test: 3204.615 (+/-592.15 ) 15 -5.85% extremem-phased/do_nothing_p50 p=0.00675 Control: 608.153us (+/- 44.52us) 90 Test: 574.538us (+/- 47.49us) 15 ------------- Commit messages: - Fix white space - Remove instrumentation - Fix my fix limiting find-next-marked-object - Fix request to find next marked - Enhance log messages for generations at end of gc - Allow the search for old-gen PLAB to see regions not collector-free Changes: https://git.openjdk.org/shenandoah/pull/198/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=198&range=00 Stats: 66 lines in 6 files changed: 54 ins; 4 del; 8 mod Patch: https://git.openjdk.org/shenandoah/pull/198.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/198/head:pull/198 PR: https://git.openjdk.org/shenandoah/pull/198 From wkemper at openjdk.org Wed Jan 11 16:58:53 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 11 Jan 2023 16:58:53 GMT Subject: RFR: Broaden plab region search In-Reply-To: References: Message-ID: On Wed, 11 Jan 2023 16:33:09 GMT, Kelvin Nilsen wrote: > Recent testing revealed that old-gen heap regions were being ignored in the search to satisfy new PLAB allocation requests. It was discovered that many of these regions are found within ranges of the ShenandoahFreeSet that are not considered to be "is_collector_free()". > > This is a first step in a two-step change. In this patch, we broaden the search for old-gen heap regions that have available memory to include regions that are not is_collector_free. In a second step of this improvement, we intend to restructure the implementation of the ShenandoahFreeSet to better distinguish ranges of regions that hold young-gen survivors, which ranges hold old-gen, and which ranges are intended to serve as mutator allocations. > > On an Extremem workload that allocates roughly 626 M/s in a total heap size of 49G with an old-gen usage of 18.7G, we saw significant improvements compared to mainline generational Shenandoah implementation: > > 1. Concurrent GC passes decreased from 606 to 493 (19% improvement) > 2. Degenerated GC passes decreased from 17 to 3 (82% improvement) > 3. Full GCs decreased from 15 to 3 (80% improvement) > 4. P50 latency for Customer Preparation Processing (CPP) improved from 1798 us to 1735 us (3.5%) > 5. P100 latency for CPP improved from 25_636_285 us to 9_148_580 us (64% improvement) > > Across a broad assortment of performance related CI tests, we also benefits on x86: > > -74.24% extremem-phased/do_nothing_p99 p=0.00061 > Control: 2.318s (+/-941.25ms) 80 > Test: 1.330s (+/- 1.01s ) 15 > > -15.70% extremem-phased/context_switch_count p=0.02032 > Control: 28188.234 (+/-5868.23 ) 80 > Test: 24362.538 (+/-4260.19 ) 15 > > -6.26% extremem-phased/do_nothing_p50 p=0.00246 > Control: 603.203us (+/- 38.32us) 80 > Test: 567.692us (+/- 50.34us) 15 > > And on aarch64: > > +22.92% specjbb2015/sla_10000_jops p=0.01104 > Control: 2607.153 (+/-799.74 ) 90 > Test: 3204.615 (+/-592.15 ) 15 > > -5.85% extremem-phased/do_nothing_p50 p=0.00675 > Control: 608.153us (+/- 44.52us) 90 > Test: 574.538us (+/- 47.49us) 15 The workaround makes sense. Consider consolidating some of the log messages by reusing `ShenandoahGeneration::log_status`. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 245: > 243: heap->reset_old_evac_expended(); > 244: heap->set_promoted_reserve(0); > 245: log_info(gc, ergo)("At end of Concurrent GC, old_available: " SIZE_FORMAT "%s out of total: " SIZE_FORMAT "%s," This looks a lot like the implementation of `ShenandoahGeneration::log_status` . Could consolidate these messages and reduce logging duplicate information. src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 58: > 56: vmop_degenerated(); > 57: ShenandoahHeap* heap = ShenandoahHeap::heap(); > 58: if (heap->mode()->is_generational()) { As above, consider using `ShenandoahGeneration::log_status`. src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 179: > 177: size_t old_available = heap->old_generation()->available(); > 178: size_t young_available = heap->young_generation()->available(); > 179: log_info(gc, ergo)("At end of Full GC, old_available: " SIZE_FORMAT "%s out of total: " SIZE_FORMAT "%s," Consider `ShenandoahGeneration::log_status`. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 3194: > 3192: } else { > 3193: // This object is not live so we don't verify dirty cards contained therein > 3194: assert(tams != nullptr, "If object is not live, ctx and tams should be non-null"); Might need to rebase these changes after integrating #197 . ------------- Changes requested by wkemper (Committer). PR: https://git.openjdk.org/shenandoah/pull/198 From vlivanov at openjdk.org Wed Jan 11 19:09:36 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 11 Jan 2023 19:09:36 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18] In-Reply-To: References: Message-ID: <_vJ-5zerpDnHng8O_QZ5LEfVb09knfCRIrWfHRB1eTQ=.f01389ce-82e9-4073-86e3-08b70219cf0b@github.com> On Mon, 9 Jan 2023 16:49:25 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly lea external address Before the patch goes in, I'd like to see a plan how the code will be refactored later. At the very least, I expect `is_string_hashcode`-related logic to go away and the intrinsic logic to be guided solely by a basic type of elements. If not in the initial version, then shortly after as a follow-up enhancement. Another thing I want to see is `VectorizedHashCode` node to go away and replaced with a stub call. ------------- Changes requested by vlivanov (Reviewer). PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Wed Jan 11 21:26:21 2023 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 11 Jan 2023 21:26:21 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18] In-Reply-To: References: Message-ID: On Mon, 9 Jan 2023 16:49:25 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly lea external address I'm not convinced using basic types - with `T_BOOLEAN` for unsigned byte - improves this much. It'd of course be nice with something more canonical than a set of adhoc constants strewn in here to steer this, but maybe we should pass element size and signedness. I think this would be a reasonable cleanup. I'd also be willing to spend time rewriting as a stub call if someone could give me some pointers on how to best do that. This might be straightforward and simplify the implementation, but making a stub call could have noticeable overheads for small strings. A this is the common case the stub call overhead *could* be prohibitively expensive. An alternative is to keep the new node but extract the vectorized path as a stub routine and call it from inside the inlined intrinsic - similarly to what's done in `C2_MacroAssembler::string_compare` on aarch64. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From shade at openjdk.org Thu Jan 12 16:24:32 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 12 Jan 2023 16:24:32 GMT Subject: RFR: 8300053: Shenandoah: Handle more GCCauses in ShenandoahControlThread::request_gc Message-ID: $ CONF=linux-x86_64-server-fastdebug make test TEST=jdk/internal/vm/Continuation/BasicExt.java TEST_VM_OPTS="-XX:+UseShenandoahGC" # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp:479), pid=406430, tid=406562 # assert(GCCause::is_user_requested_gc(cause) || GCCause::is_serviceability_requested_gc(cause) || cause == GCCause::_metadata_GC_clear_soft_refs || cause == GCCause::_codecache_GC_aggressive || cause == GCCause::_codecache_GC_threshold || cause == GCCause::_full_gc_alot || cause == GCCause::_wb_full_gc || cause == GCCause::_wb_breakpoint || cause == GCCause::_scavenge_alot) failed: only requested GCs here: WhiteBox Initiated Young GC # ``` Added a missing cause into the assert. The test starts to pass. ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/11970/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11970&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8300053 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11970.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11970/head:pull/11970 PR: https://git.openjdk.org/jdk/pull/11970 From wkemper at openjdk.org Thu Jan 12 17:33:12 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 12 Jan 2023 17:33:12 GMT Subject: RFR: 8300053: Shenandoah: Handle more GCCauses in ShenandoahControlThread::request_gc In-Reply-To: References: Message-ID: On Thu, 12 Jan 2023 16:15:08 GMT, Aleksey Shipilev wrote: > $ CONF=linux-x86_64-server-fastdebug make test TEST=jdk/internal/vm/Continuation/BasicExt.java TEST_VM_OPTS="-XX:+UseShenandoahGC" > > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp:479), pid=406430, tid=406562 > # assert(GCCause::is_user_requested_gc(cause) || GCCause::is_serviceability_requested_gc(cause) || cause == GCCause::_metadata_GC_clear_soft_refs || cause == GCCause::_codecache_GC_aggressive || cause == GCCause::_codecache_GC_threshold || cause == GCCause::_full_gc_alot || cause == GCCause::_wb_full_gc || cause == GCCause::_wb_breakpoint || cause == GCCause::_scavenge_alot) failed: only requested GCs here: WhiteBox Initiated Young GC > # > ``` > > Added a missing cause into the assert. The test starts to pass. We will need to handle `wb_young_gc` appropriately for Shenandoah's generational mode. ------------- Marked as reviewed by wkemper (no project role). PR: https://git.openjdk.org/jdk/pull/11970 From kdnilsen at openjdk.org Thu Jan 12 21:39:36 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 12 Jan 2023 21:39:36 GMT Subject: RFR: Broaden plab region search [v2] In-Reply-To: References: Message-ID: > Recent testing revealed that old-gen heap regions were being ignored in the search to satisfy new PLAB allocation requests. It was discovered that many of these regions are found within ranges of the ShenandoahFreeSet that are not considered to be "is_collector_free()". > > This is a first step in a two-step change. In this patch, we broaden the search for old-gen heap regions that have available memory to include regions that are not is_collector_free. In a second step of this improvement, we intend to restructure the implementation of the ShenandoahFreeSet to better distinguish ranges of regions that hold young-gen survivors, which ranges hold old-gen, and which ranges are intended to serve as mutator allocations. > > On an Extremem workload that allocates roughly 626 M/s in a total heap size of 49G with an old-gen usage of 18.7G, we saw significant improvements compared to mainline generational Shenandoah implementation: > > 1. Concurrent GC passes decreased from 606 to 493 (19% improvement) > 2. Degenerated GC passes decreased from 17 to 3 (82% improvement) > 3. Full GCs decreased from 15 to 3 (80% improvement) > 4. P50 latency for Customer Preparation Processing (CPP) improved from 1798 us to 1735 us (3.5%) > 5. P100 latency for CPP improved from 25_636_285 us to 9_148_580 us (64% improvement) > > Across a broad assortment of performance related CI tests, we also benefits on x86: > > -74.24% extremem-phased/do_nothing_p99 p=0.00061 > Control: 2.318s (+/-941.25ms) 80 > Test: 1.330s (+/- 1.01s ) 15 > > -15.70% extremem-phased/context_switch_count p=0.02032 > Control: 28188.234 (+/-5868.23 ) 80 > Test: 24362.538 (+/-4260.19 ) 15 > > -6.26% extremem-phased/do_nothing_p50 p=0.00246 > Control: 603.203us (+/- 38.32us) 80 > Test: 567.692us (+/- 50.34us) 15 > > And on aarch64: > > +22.92% specjbb2015/sla_10000_jops p=0.01104 > Control: 2607.153 (+/-799.74 ) 90 > Test: 3204.615 (+/-592.15 ) 15 > > -5.85% extremem-phased/do_nothing_p50 p=0.00675 > Control: 608.153us (+/- 44.52us) 90 > Test: 574.538us (+/- 47.49us) 15 Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Unify GC heap status logging ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/198/files - new: https://git.openjdk.org/shenandoah/pull/198/files/8a46a669..ac3e14c0 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=198&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=198&range=00-01 Stats: 75 lines in 9 files changed: 28 ins; 37 del; 10 mod Patch: https://git.openjdk.org/shenandoah/pull/198.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/198/head:pull/198 PR: https://git.openjdk.org/shenandoah/pull/198 From kdnilsen at openjdk.org Thu Jan 12 21:39:38 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 12 Jan 2023 21:39:38 GMT Subject: RFR: Broaden plab region search [v2] In-Reply-To: References: Message-ID: On Wed, 11 Jan 2023 16:43:31 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Unify GC heap status logging > > src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 179: > >> 177: size_t old_available = heap->old_generation()->available(); >> 178: size_t young_available = heap->young_generation()->available(); >> 179: log_info(gc, ergo)("At end of Full GC, old_available: " SIZE_FORMAT "%s out of total: " SIZE_FORMAT "%s," > > Consider `ShenandoahGeneration::log_status`. I've integrated the two heap-status logging approaches. Let me know what you think... Thanks for suggestion. > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 3194: > >> 3192: } else { >> 3193: // This object is not live so we don't verify dirty cards contained therein >> 3194: assert(tams != nullptr, "If object is not live, ctx and tams should be non-null"); > > Might need to rebase these changes after integrating #197 . So far, git seems to feel like there are "no conflicts". This code is identical to what I delivered in PR197. ------------- PR: https://git.openjdk.org/shenandoah/pull/198 From wkemper at openjdk.org Thu Jan 12 21:44:03 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 12 Jan 2023 21:44:03 GMT Subject: RFR: Broaden plab region search [v2] In-Reply-To: References: Message-ID: On Thu, 12 Jan 2023 21:39:36 GMT, Kelvin Nilsen wrote: >> Recent testing revealed that old-gen heap regions were being ignored in the search to satisfy new PLAB allocation requests. It was discovered that many of these regions are found within ranges of the ShenandoahFreeSet that are not considered to be "is_collector_free()". >> >> This is a first step in a two-step change. In this patch, we broaden the search for old-gen heap regions that have available memory to include regions that are not is_collector_free. In a second step of this improvement, we intend to restructure the implementation of the ShenandoahFreeSet to better distinguish ranges of regions that hold young-gen survivors, which ranges hold old-gen, and which ranges are intended to serve as mutator allocations. >> >> On an Extremem workload that allocates roughly 626 M/s in a total heap size of 49G with an old-gen usage of 18.7G, we saw significant improvements compared to mainline generational Shenandoah implementation: >> >> 1. Concurrent GC passes decreased from 606 to 493 (19% improvement) >> 2. Degenerated GC passes decreased from 17 to 3 (82% improvement) >> 3. Full GCs decreased from 15 to 3 (80% improvement) >> 4. P50 latency for Customer Preparation Processing (CPP) improved from 1798 us to 1735 us (3.5%) >> 5. P100 latency for CPP improved from 25_636_285 us to 9_148_580 us (64% improvement) >> >> Across a broad assortment of performance related CI tests, we also benefits on x86: >> >> -74.24% extremem-phased/do_nothing_p99 p=0.00061 >> Control: 2.318s (+/-941.25ms) 80 >> Test: 1.330s (+/- 1.01s ) 15 >> >> -15.70% extremem-phased/context_switch_count p=0.02032 >> Control: 28188.234 (+/-5868.23 ) 80 >> Test: 24362.538 (+/-4260.19 ) 15 >> >> -6.26% extremem-phased/do_nothing_p50 p=0.00246 >> Control: 603.203us (+/- 38.32us) 80 >> Test: 567.692us (+/- 50.34us) 15 >> >> And on aarch64: >> >> +22.92% specjbb2015/sla_10000_jops p=0.01104 >> Control: 2607.153 (+/-799.74 ) 90 >> Test: 3204.615 (+/-592.15 ) 15 >> >> -5.85% extremem-phased/do_nothing_p50 p=0.00675 >> Control: 608.153us (+/- 44.52us) 90 >> Test: 574.538us (+/- 47.49us) 15 > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Unify GC heap status logging Thank you for making those changes to the logging. ------------- Marked as reviewed by wkemper (Committer). PR: https://git.openjdk.org/shenandoah/pull/198 From kdnilsen at openjdk.org Thu Jan 12 21:49:17 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 12 Jan 2023 21:49:17 GMT Subject: Integrated: Broaden plab region search In-Reply-To: References: Message-ID: <9ay8LOiRxOdMmjsqrkpQmlmSjf0xTDJIWCJyHNBTzks=.4893404d-9d88-488f-ab26-e3758006716b@github.com> On Wed, 11 Jan 2023 16:33:09 GMT, Kelvin Nilsen wrote: > Recent testing revealed that old-gen heap regions were being ignored in the search to satisfy new PLAB allocation requests. It was discovered that many of these regions are found within ranges of the ShenandoahFreeSet that are not considered to be "is_collector_free()". > > This is a first step in a two-step change. In this patch, we broaden the search for old-gen heap regions that have available memory to include regions that are not is_collector_free. In a second step of this improvement, we intend to restructure the implementation of the ShenandoahFreeSet to better distinguish ranges of regions that hold young-gen survivors, which ranges hold old-gen, and which ranges are intended to serve as mutator allocations. > > On an Extremem workload that allocates roughly 626 M/s in a total heap size of 49G with an old-gen usage of 18.7G, we saw significant improvements compared to mainline generational Shenandoah implementation: > > 1. Concurrent GC passes decreased from 606 to 493 (19% improvement) > 2. Degenerated GC passes decreased from 17 to 3 (82% improvement) > 3. Full GCs decreased from 15 to 3 (80% improvement) > 4. P50 latency for Customer Preparation Processing (CPP) improved from 1798 us to 1735 us (3.5%) > 5. P100 latency for CPP improved from 25_636_285 us to 9_148_580 us (64% improvement) > > Across a broad assortment of performance related CI tests, we also benefits on x86: > > -74.24% extremem-phased/do_nothing_p99 p=0.00061 > Control: 2.318s (+/-941.25ms) 80 > Test: 1.330s (+/- 1.01s ) 15 > > -15.70% extremem-phased/context_switch_count p=0.02032 > Control: 28188.234 (+/-5868.23 ) 80 > Test: 24362.538 (+/-4260.19 ) 15 > > -6.26% extremem-phased/do_nothing_p50 p=0.00246 > Control: 603.203us (+/- 38.32us) 80 > Test: 567.692us (+/- 50.34us) 15 > > And on aarch64: > > +22.92% specjbb2015/sla_10000_jops p=0.01104 > Control: 2607.153 (+/-799.74 ) 90 > Test: 3204.615 (+/-592.15 ) 15 > > -5.85% extremem-phased/do_nothing_p50 p=0.00675 > Control: 608.153us (+/- 44.52us) 90 > Test: 574.538us (+/- 47.49us) 15 This pull request has now been integrated. Changeset: 0be422bf Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/0be422bf31db81b640ab0911a327a65e5c56381a Stats: 97 lines in 11 files changed: 63 ins; 23 del; 11 mod Broaden plab region search Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/shenandoah/pull/198 From wkemper at openjdk.org Fri Jan 13 04:12:45 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 13 Jan 2023 04:12:45 GMT Subject: RFR: Use whole number of regions when resizing generations Message-ID: This avoids overflowing calculations when using a 32 bit word - it also simplifies some of the operations. All of the github actions are succeeding now. ------------- Commit messages: - Merge branch 'openjdk:master' into use-regions-for-sizing - Use region count rather than bytes count to avoid overflow with 32 bit words Changes: https://git.openjdk.org/shenandoah/pull/199/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=199&range=00 Stats: 76 lines in 2 files changed: 15 ins; 5 del; 56 mod Patch: https://git.openjdk.org/shenandoah/pull/199.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/199/head:pull/199 PR: https://git.openjdk.org/shenandoah/pull/199 From eosterlund at openjdk.org Fri Jan 13 12:52:06 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 13 Jan 2023 12:52:06 GMT Subject: RFR: 8299879: CollectedHeap hierarchy should use override Message-ID: The CollectedHeap class declares a bunch of virtual methods that are overridden in its subclasses. Now that we can, we should convert this code to use "override" consistently, to help making the code more robust. ------------- Commit messages: - 8299879: CollectedHeap hierarchy should use override Changes: https://git.openjdk.org/jdk/pull/11937/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11937&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8299879 Stats: 235 lines in 6 files changed: 2 ins; 5 del; 228 mod Patch: https://git.openjdk.org/jdk/pull/11937.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11937/head:pull/11937 PR: https://git.openjdk.org/jdk/pull/11937 From stefank at openjdk.org Fri Jan 13 12:52:07 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Jan 2023 12:52:07 GMT Subject: RFR: 8299879: CollectedHeap hierarchy should use override In-Reply-To: References: Message-ID: <27-qTaPTyaq4-REu5ZIwgRQBD6PzYXuyNBUciN2ytyE=.c1816ddf-a0f6-453d-af77-0e2ccce1230d@github.com> On Wed, 11 Jan 2023 09:05:44 GMT, Erik ?sterlund wrote: > The CollectedHeap class declares a bunch of virtual methods that are overridden in its subclasses. Now that we can, we should convert this code to use "override" consistently, to help making the code more robust. Marked as reviewed by stefank (Reviewer). Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11937 From tschatzl at openjdk.org Fri Jan 13 12:52:08 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 13 Jan 2023 12:52:08 GMT Subject: RFR: 8299879: CollectedHeap hierarchy should use override In-Reply-To: References: Message-ID: On Wed, 11 Jan 2023 09:05:44 GMT, Erik ?sterlund wrote: > The CollectedHeap class declares a bunch of virtual methods that are overridden in its subclasses. Now that we can, we should convert this code to use "override" consistently, to help making the code more robust. Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11937 From eosterlund at openjdk.org Fri Jan 13 12:52:09 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 13 Jan 2023 12:52:09 GMT Subject: RFR: 8299879: CollectedHeap hierarchy should use override In-Reply-To: <27-qTaPTyaq4-REu5ZIwgRQBD6PzYXuyNBUciN2ytyE=.c1816ddf-a0f6-453d-af77-0e2ccce1230d@github.com> References: <27-qTaPTyaq4-REu5ZIwgRQBD6PzYXuyNBUciN2ytyE=.c1816ddf-a0f6-453d-af77-0e2ccce1230d@github.com> Message-ID: On Thu, 12 Jan 2023 10:58:52 GMT, Stefan Karlsson wrote: >> The CollectedHeap class declares a bunch of virtual methods that are overridden in its subclasses. Now that we can, we should convert this code to use "override" consistently, to help making the code more robust. > > Marked as reviewed by stefank (Reviewer). Thank you for the reviews, @stefank and @tschatzl! ------------- PR: https://git.openjdk.org/jdk/pull/11937 From eosterlund at openjdk.org Fri Jan 13 16:22:17 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 13 Jan 2023 16:22:17 GMT Subject: RFR: 8299673: Simplify object pinning interactions with string deduplication In-Reply-To: References: Message-ID: On Wed, 11 Jan 2023 04:43:54 GMT, Kim Barrett wrote: >> When raw char* String contents are exposed to JNI code, we >> >> 1. Load the string.value and pin it >> 2. Run native code >> 3. Load the string.value and unpin it >> >> Given this sequence we would be in trouble if between 1 and 3, string deduplication changed the value object. Then the pinning and unpinning wouldn't be balanced. >> >> The current approach for dealing with this is to have a bunch of code to guard against deduplication. An alternative simpler solution is to just change step 3 to pass in the same value. We already have enough information available to do that. Then the pinning and unpinning is also balanced, and we don't need to have any special interactions with string deduplication and can decouple these orthogonal concerns. >> >> It's worth noting though that the contract of pin_object now makes it explicit that pinned objects must not be recycled, even if not otherwise reachable. That seems to come naturally for region based pinning, but is worth keeping in mind. The exposed char* might be the only thing referencing the string value when string dedup happens concurrently. > > Looks good. Thank you for the reviews, @kimbarrett and @stefank! ------------- PR: https://git.openjdk.org/jdk/pull/11923 From kdnilsen at openjdk.org Fri Jan 13 17:35:50 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 13 Jan 2023 17:35:50 GMT Subject: RFR: Fix fullgc assertion Message-ID: Relax generation sizing assertions during Full GC because of the sequencing of operations that occur during Full GC. Enforce the assertion constraints at the end of Full GC. ------------- Commit messages: - Fix white space - Merge remote-tracking branch 'GitFarmBranch/fix-fullgc-assertion-error' into fix-fullgc-assertion - Fix assertion failure during Full GC Changes: https://git.openjdk.org/shenandoah/pull/200/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=200&range=00 Stats: 15 lines in 2 files changed: 13 ins; 0 del; 2 mod Patch: https://git.openjdk.org/shenandoah/pull/200.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/200/head:pull/200 PR: https://git.openjdk.org/shenandoah/pull/200 From ysr at openjdk.org Fri Jan 13 17:35:50 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 13 Jan 2023 17:35:50 GMT Subject: RFR: Fix fullgc assertion In-Reply-To: References: Message-ID: On Fri, 13 Jan 2023 16:28:03 GMT, Kelvin Nilsen wrote: > Relax generation sizing assertions during Full GC because of the sequencing of operations that occur during Full GC. > > Enforce the assertion constraints at the end of Full GC. Marked as reviewed by ysr (Author). ------------- PR: https://git.openjdk.org/shenandoah/pull/200 From kdnilsen at openjdk.org Fri Jan 13 17:54:20 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 13 Jan 2023 17:54:20 GMT Subject: RFR: Use whole number of regions when resizing generations In-Reply-To: References: Message-ID: On Fri, 13 Jan 2023 00:45:38 GMT, William Kemper wrote: > This avoids overflowing calculations when using a 32 bit word - it also simplifies some of the operations. All of the github actions are succeeding now. Thanks. ------------- Marked as reviewed by kdnilsen (Committer). PR: https://git.openjdk.org/shenandoah/pull/199 From wkemper at openjdk.org Fri Jan 13 17:55:16 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 13 Jan 2023 17:55:16 GMT Subject: RFR: Fix fullgc assertion In-Reply-To: References: Message-ID: On Fri, 13 Jan 2023 16:28:03 GMT, Kelvin Nilsen wrote: > Relax generation sizing assertions during Full GC because of the sequencing of operations that occur during Full GC. > > Enforce the assertion constraints at the end of Full GC. Marked as reviewed by wkemper (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/200 From kdnilsen at openjdk.org Fri Jan 13 17:59:24 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 13 Jan 2023 17:59:24 GMT Subject: Integrated: Fix fullgc assertion In-Reply-To: References: Message-ID: On Fri, 13 Jan 2023 16:28:03 GMT, Kelvin Nilsen wrote: > Relax generation sizing assertions during Full GC because of the sequencing of operations that occur during Full GC. > > Enforce the assertion constraints at the end of Full GC. This pull request has now been integrated. Changeset: 0e15cb6d Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/0e15cb6dfcd97a816f4213cd38ffdd5f402536b9 Stats: 15 lines in 2 files changed: 13 ins; 0 del; 2 mod Fix fullgc assertion Reviewed-by: ysr, wkemper ------------- PR: https://git.openjdk.org/shenandoah/pull/200 From wkemper at openjdk.org Fri Jan 13 18:29:22 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 13 Jan 2023 18:29:22 GMT Subject: Integrated: Use whole number of regions when resizing generations In-Reply-To: References: Message-ID: On Fri, 13 Jan 2023 00:45:38 GMT, William Kemper wrote: > This avoids overflowing calculations when using a 32 bit word - it also simplifies some of the operations. All of the github actions are succeeding now. This pull request has now been integrated. Changeset: ec3e5ef1 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/ec3e5ef150e1ff7b1e35a450653d8bf0bb1ee6c9 Stats: 76 lines in 2 files changed: 15 ins; 5 del; 56 mod Use whole number of regions when resizing generations Reviewed-by: kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/199 From wkemper at openjdk.org Fri Jan 13 22:47:24 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 13 Jan 2023 22:47:24 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: Merge tag jdk-21+5 ------------- Commit messages: - Merge tag 'jdk-21+5' into merge-jdk21-5 - Merge - 8299862: OfAddress setter should disallow heap segments - 8299849: Revert JDK-8294461: wrong effectively final determination by javac - 8299227: host `exif.org` not found in link in doc comment - 8299715: IR test: VectorGatherScatterTest.java fails with SVE randomly - 8294744: AArch64: applications/kitchensink/Kitchensink.java crashed: assert(oopDesc::is_oop(obj)) failed: not an oop - 8299733: AArch64: "unexpected literal addressing mode" assertion failure with -XX:+PrintC1Statistics - 8299693: Change to Xcode12.4+1.1 devkit for building on macOS at Oracle - 8300001: ProblemList test java/security/Policy/Root/Root.java - ... and 97 more: https://git.openjdk.org/shenandoah/compare/ec3e5ef1...06c44b37 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=shenandoah&pr=201&range=00.0 - openjdk/jdk:master: https://webrevs.openjdk.org/?repo=shenandoah&pr=201&range=00.1 Changes: https://git.openjdk.org/shenandoah/pull/201/files Stats: 7201 lines in 404 files changed: 4284 ins; 1621 del; 1296 mod Patch: https://git.openjdk.org/shenandoah/pull/201.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/201/head:pull/201 PR: https://git.openjdk.org/shenandoah/pull/201 From wkemper at openjdk.org Sat Jan 14 00:18:48 2023 From: wkemper at openjdk.org (William Kemper) Date: Sat, 14 Jan 2023 00:18:48 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Fri, 13 Jan 2023 22:39:56 GMT, William Kemper wrote: > Merge tag jdk-21+5 This pull request has now been integrated. Changeset: bfeccbdf Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/bfeccbdfcc57c9c98925eebec0d5ed965974cd93 Stats: 7201 lines in 404 files changed: 4284 ins; 1621 del; 1296 mod Merge openjdk/jdk:master ------------- PR: https://git.openjdk.org/shenandoah/pull/201 From redestad at openjdk.org Sun Jan 15 23:24:18 2023 From: redestad at openjdk.org (Claes Redestad) Date: Sun, 15 Jan 2023 23:24:18 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18] In-Reply-To: References: Message-ID: On Mon, 9 Jan 2023 16:49:25 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly lea external address FWIW I prototyped a follow-up to use basic types and extracted the String special-casing from the code. To do so a few things unraveled, such as needing to pass the initial value, but arguably it all ended up a bit neater. I've put this experiment in another branch for now (https://github.com/openjdk/jdk/compare/pr/10847...cl4es:jdk:8282664-type-cleanup?expand=1) since I need to test it through thoroughly, but functionally and to ensure there's no obvious performance impact (did some quick sanity testing on micros that look perfectly neutral) @iwanowww does this make you a bit happier? I think of it as an immediate follow-up - but if there's strong preference I can merge it into this PR. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From rkennke at openjdk.org Mon Jan 16 09:26:10 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 16 Jan 2023 09:26:10 GMT Subject: RFR: 8300053: Shenandoah: Handle more GCCauses in ShenandoahControlThread::request_gc In-Reply-To: References: Message-ID: <62bujnK6t3dStlI1cJkfcnvkddm91PSBsf5rw36i6ME=.19ea2979-f448-46d9-8a20-9f05264c69da@github.com> On Thu, 12 Jan 2023 16:15:08 GMT, Aleksey Shipilev wrote: > $ CONF=linux-x86_64-server-fastdebug make test TEST=jdk/internal/vm/Continuation/BasicExt.java TEST_VM_OPTS="-XX:+UseShenandoahGC" > > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp:479), pid=406430, tid=406562 > # assert(GCCause::is_user_requested_gc(cause) || GCCause::is_serviceability_requested_gc(cause) || cause == GCCause::_metadata_GC_clear_soft_refs || cause == GCCause::_codecache_GC_aggressive || cause == GCCause::_codecache_GC_threshold || cause == GCCause::_full_gc_alot || cause == GCCause::_wb_full_gc || cause == GCCause::_wb_breakpoint || cause == GCCause::_scavenge_alot) failed: only requested GCs here: WhiteBox Initiated Young GC > # > ``` > > Added a missing cause into the assert. The test starts to pass. Looks good to me, thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.org/jdk/pull/11970 From shade at openjdk.org Mon Jan 16 09:32:15 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Jan 2023 09:32:15 GMT Subject: RFR: 8300053: Shenandoah: Handle more GCCauses in ShenandoahControlThread::request_gc In-Reply-To: References: Message-ID: <3DZXGR1saHnahcRv3W44iKXj4fU7Pw0pplbxopG2-vQ=.ddf46f05-b778-4d1e-a728-aebd36a1f809@github.com> On Thu, 12 Jan 2023 16:15:08 GMT, Aleksey Shipilev wrote: > $ CONF=linux-x86_64-server-fastdebug make test TEST=jdk/internal/vm/Continuation/BasicExt.java TEST_VM_OPTS="-XX:+UseShenandoahGC" > > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp:479), pid=406430, tid=406562 > # assert(GCCause::is_user_requested_gc(cause) || GCCause::is_serviceability_requested_gc(cause) || cause == GCCause::_metadata_GC_clear_soft_refs || cause == GCCause::_codecache_GC_aggressive || cause == GCCause::_codecache_GC_threshold || cause == GCCause::_full_gc_alot || cause == GCCause::_wb_full_gc || cause == GCCause::_wb_breakpoint || cause == GCCause::_scavenge_alot) failed: only requested GCs here: WhiteBox Initiated Young GC > # > ``` > > Added a missing cause into the assert. The test starts to pass. Thank you! ------------- PR: https://git.openjdk.org/jdk/pull/11970 From shade at openjdk.org Mon Jan 16 09:35:20 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Jan 2023 09:35:20 GMT Subject: Integrated: 8300053: Shenandoah: Handle more GCCauses in ShenandoahControlThread::request_gc In-Reply-To: References: Message-ID: On Thu, 12 Jan 2023 16:15:08 GMT, Aleksey Shipilev wrote: > $ CONF=linux-x86_64-server-fastdebug make test TEST=jdk/internal/vm/Continuation/BasicExt.java TEST_VM_OPTS="-XX:+UseShenandoahGC" > > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp:479), pid=406430, tid=406562 > # assert(GCCause::is_user_requested_gc(cause) || GCCause::is_serviceability_requested_gc(cause) || cause == GCCause::_metadata_GC_clear_soft_refs || cause == GCCause::_codecache_GC_aggressive || cause == GCCause::_codecache_GC_threshold || cause == GCCause::_full_gc_alot || cause == GCCause::_wb_full_gc || cause == GCCause::_wb_breakpoint || cause == GCCause::_scavenge_alot) failed: only requested GCs here: WhiteBox Initiated Young GC > # > ``` > > Added a missing cause into the assert. The test starts to pass. This pull request has now been integrated. Changeset: cac72a60 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/cac72a60181d3570562f8534c691528d06e40cb8 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8300053: Shenandoah: Handle more GCCauses in ShenandoahControlThread::request_gc Reviewed-by: wkemper, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/11970 From eosterlund at openjdk.org Mon Jan 16 10:57:16 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 16 Jan 2023 10:57:16 GMT Subject: Integrated: 8299879: CollectedHeap hierarchy should use override In-Reply-To: References: Message-ID: <8qoRlJqwylrfMVs8E8Z23r9nM603_yyBTNsGphtC8Gw=.5b8d9e85-953f-48bc-876c-42dd782dfe48@github.com> On Wed, 11 Jan 2023 09:05:44 GMT, Erik ?sterlund wrote: > The CollectedHeap class declares a bunch of virtual methods that are overridden in its subclasses. Now that we can, we should convert this code to use "override" consistently, to help making the code more robust. This pull request has now been integrated. Changeset: a7342853 Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/a734285314a34ed61583132f2fc6be9d9c861af4 Stats: 235 lines in 6 files changed: 2 ins; 5 del; 228 mod 8299879: CollectedHeap hierarchy should use override Reviewed-by: stefank, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/11937 From eosterlund at openjdk.org Mon Jan 16 11:32:52 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 16 Jan 2023 11:32:52 GMT Subject: RFR: 8299673: Simplify object pinning interactions with string deduplication [v2] In-Reply-To: References: Message-ID: > When raw char* String contents are exposed to JNI code, we > > 1. Load the string.value and pin it > 2. Run native code > 3. Load the string.value and unpin it > > Given this sequence we would be in trouble if between 1 and 3, string deduplication changed the value object. Then the pinning and unpinning wouldn't be balanced. > > The current approach for dealing with this is to have a bunch of code to guard against deduplication. An alternative simpler solution is to just change step 3 to pass in the same value. We already have enough information available to do that. Then the pinning and unpinning is also balanced, and we don't need to have any special interactions with string deduplication and can decouple these orthogonal concerns. > > It's worth noting though that the contract of pin_object now makes it explicit that pinned objects must not be recycled, even if not otherwise reachable. That seems to come naturally for region based pinning, but is worth keeping in mind. The exposed char* might be the only thing referencing the string value when string dedup happens concurrently. Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Include sorting order - Merge branch 'master' into 8299673_pin_dedup - More Kim feedback - Feedback from Kim - 8299673: Simplify object pinning interactions with string deduplication ------------- Changes: https://git.openjdk.org/jdk/pull/11923/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11923&range=01 Stats: 153 lines in 14 files changed: 65 ins; 68 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/11923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11923/head:pull/11923 PR: https://git.openjdk.org/jdk/pull/11923 From smonteith at openjdk.org Mon Jan 16 21:48:24 2023 From: smonteith at openjdk.org (Stuart Monteith) Date: Mon, 16 Jan 2023 21:48:24 GMT Subject: RFR: 8298647: GenShen require heap size 2MB granularity Message-ID: Generational Shenandoah requires 2MB granularity in order for card tables to cover the allocated heap. Each byte in a page of card table represents 512 heap bytes. As card tables are allocated 4KB at a time, 4KB * 512 = 2MB. There is a circular dependency between the region calculations and the heap size calculations. This unconditionally rounds up the heap size to 2MB. It might be preferable to do this only when generational mode is enabled. Running with: java -Xlog:gc*=trace -XX:+UseShenandoahGC -mx495m \ -XX:ShenandoahGCMode=generational -version on a debug build is sufficient to reproduce this problem. ------------- Commit messages: - 8298647: GenShen require heap size 2MB granularity Changes: https://git.openjdk.org/shenandoah/pull/202/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=202&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8298647 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/202.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/202/head:pull/202 PR: https://git.openjdk.org/shenandoah/pull/202 From redestad at openjdk.org Mon Jan 16 23:19:49 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 16 Jan 2023 23:19:49 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v19] In-Reply-To: References: Message-ID: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with three additional commits since the last revision: - Change signature to offset + length, add sanity test - Adapt end input to len (fix latent bug with sub-ranges - Clean-up types, simplify, hoist special-casing of String variants from arrays_hashcode, add initial value and range to intrinsic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/c8c58f4a..59e179c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=17-18 Stats: 210 lines in 13 files changed: 41 ins; 61 del; 108 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Jan 16 23:28:37 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 16 Jan 2023 23:28:37 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v20] In-Reply-To: References: Message-ID: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: trailing ws ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/59e179c5..ffe5b66d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=18-19 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Jan 16 23:32:13 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 16 Jan 2023 23:32:13 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: References: <6lAQI6kDDTGbskylHcWReX8ExaB6qkwgqoai7E6ikZY=.8a69a63c-453d-4bbd-8c76-4d477bfb77fe@github.com> Message-ID: On Mon, 14 Nov 2022 18:28:53 GMT, Vladimir Ivanov wrote: >>> Also, I'd like to note that C2 auto-vectorization support is not too far away from being able to optimize hash code computations. At some point, I was able to achieve some promising results with modest tweaking of SuperWord pass: https://github.com/iwanowww/jdk/blob/superword/notes.txt http://cr.openjdk.java.net/~vlivanov/superword.reduction/webrev.00/ >> >> Intriguing. How far off is this - and do you think it'll be able to match the efficiency we see here with a memoized coefficient table etc? >> >> If we turn this intrinsic into a stub we might also be able to reuse the optimization in other places, including from within the VM (calculating String hashCodes happen in a couple of places, including String deduplication). So I think there are still a few compelling reasons to go the manual route and continue on this path. > >> How far off is this ...? > > Back then it looked way too constrained (tight constraints on code shapes). But I considered it as a generally applicable optimization. > >> ... do you think it'll be able to match the efficiency we see here with a memoized coefficient table etc? > > Yes, it is able to build the constant table at runtime when folding multiplications of constant coefficients produced during loop unrolling and then packing scalars into a constant vector. > > Moreover, briefly looking at the code shape, the vectorizer would produce a more optimal loop shape (pre-loop would align vector accesses and would use 512-bit vectors when available; vector post-loop could help as well). I've opted to include the changes spurred by @iwanowww's comments since it led to a number of revisions to the intrinsified method API, and it would be strange to introduce an intrinsified method just to change the API drastically in a follow-up. Basically `ArraysSupport.vectorizedHashCode` has been changed to take an offset + length, an initial value and the logical basic type of the array elements. Which means any necessary scaling of index and length needs to be taken care of before calling the intrinsic. This makes the implementation more flexible at no measurable performance cost. Overall the refactoring might have reduced complexity a bit. Reviewers might observe that nothing is currently passing anything but `0` and `length` to `vectorizedHashCode` outside of the simple sanity test I've added, but I've verified this feature can be used to some effect elsewhere in the JDK, e.g: https://github.com/openjdk/jdk/compare/pr/10847...cl4es:jdk:zipcoder-hashcode?expand=1 (which improves speed of opening `ZipFile` by a small percentage in microbenchmarks). ------------- PR: https://git.openjdk.org/jdk/pull/10847 From dholmes at openjdk.org Tue Jan 17 02:05:11 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 17 Jan 2023 02:05:11 GMT Subject: RFR: 8299673: Simplify object pinning interactions with string deduplication [v2] In-Reply-To: References: Message-ID: On Mon, 16 Jan 2023 11:32:52 GMT, Erik ?sterlund wrote: >> When raw char* String contents are exposed to JNI code, we >> >> 1. Load the string.value and pin it >> 2. Run native code >> 3. Load the string.value and unpin it >> >> Given this sequence we would be in trouble if between 1 and 3, string deduplication changed the value object. Then the pinning and unpinning wouldn't be balanced. >> >> The current approach for dealing with this is to have a bunch of code to guard against deduplication. An alternative simpler solution is to just change step 3 to pass in the same value. We already have enough information available to do that. Then the pinning and unpinning is also balanced, and we don't need to have any special interactions with string deduplication and can decouple these orthogonal concerns. >> >> It's worth noting though that the contract of pin_object now makes it explicit that pinned objects must not be recycled, even if not otherwise reachable. That seems to come naturally for region based pinning, but is worth keeping in mind. The exposed char* might be the only thing referencing the string value when string dedup happens concurrently. > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Include sorting order > - Merge branch 'master' into 8299673_pin_dedup > - More Kim feedback > - Feedback from Kim > - 8299673: Simplify object pinning interactions with string deduplication Initially I was a bit unsure about the conceptual model here, as I was thinking that pinning is a very general concept, where in fact it only relates to these JNI "critical" functions. So in that sense every GC must support pinning as required by those functions, so this simplification looks very neat. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11923 From eosterlund at openjdk.org Tue Jan 17 07:58:12 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 17 Jan 2023 07:58:12 GMT Subject: RFR: 8299673: Simplify object pinning interactions with string deduplication [v2] In-Reply-To: References: Message-ID: On Tue, 17 Jan 2023 02:02:47 GMT, David Holmes wrote: >> Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Include sorting order >> - Merge branch 'master' into 8299673_pin_dedup >> - More Kim feedback >> - Feedback from Kim >> - 8299673: Simplify object pinning interactions with string deduplication > > Initially I was a bit unsure about the conceptual model here, as I was thinking that pinning is a very general concept, where in fact it only relates to these JNI "critical" functions. So in that sense every GC must support pinning as required by those functions, so this simplification looks very neat. Thanks. Thanks for the review, @dholmes-ora! ------------- PR: https://git.openjdk.org/jdk/pull/11923 From eosterlund at openjdk.org Tue Jan 17 08:04:16 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 17 Jan 2023 08:04:16 GMT Subject: Integrated: 8299673: Simplify object pinning interactions with string deduplication In-Reply-To: References: Message-ID: On Tue, 10 Jan 2023 10:04:48 GMT, Erik ?sterlund wrote: > When raw char* String contents are exposed to JNI code, we > > 1. Load the string.value and pin it > 2. Run native code > 3. Load the string.value and unpin it > > Given this sequence we would be in trouble if between 1 and 3, string deduplication changed the value object. Then the pinning and unpinning wouldn't be balanced. > > The current approach for dealing with this is to have a bunch of code to guard against deduplication. An alternative simpler solution is to just change step 3 to pass in the same value. We already have enough information available to do that. Then the pinning and unpinning is also balanced, and we don't need to have any special interactions with string deduplication and can decouple these orthogonal concerns. > > It's worth noting though that the contract of pin_object now makes it explicit that pinned objects must not be recycled, even if not otherwise reachable. That seems to come naturally for region based pinning, but is worth keeping in mind. The exposed char* might be the only thing referencing the string value when string dedup happens concurrently. This pull request has now been integrated. Changeset: 9a36f8aa Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/9a36f8aadb08f8ade578530c70d9abe38f1826c6 Stats: 153 lines in 14 files changed: 65 ins; 68 del; 20 mod 8299673: Simplify object pinning interactions with string deduplication Co-authored-by: Stefan Karlsson Co-authored-by: Axel Boldt-Christmas Reviewed-by: kbarrett, stefank, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/11923 From vlivanov at openjdk.org Tue Jan 17 18:59:51 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 17 Jan 2023 18:59:51 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v20] In-Reply-To: References: Message-ID: On Mon, 16 Jan 2023 23:28:37 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > trailing ws Thanks, Claes. Looks good. Please, file an RFE for the follow-up work. src/hotspot/share/opto/machnode.cpp line 211: > 209: opcnt++; // Bump operand count > 210: assert( opcnt < numopnds, "Accessing non-existent operand" ); > 211: A leftover from a previous change? src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 168: > 166: // See https://docs.oracle.com/javase/specs/jvms/se9/html/jvms-6.html#jvms-6.5.newarray. > 167: > 168: public static final int T_BOOLEAN = 4; As an idea for a follow-up enhancement, unless there are plans to implement runtime dispatching between different stubs, the basic type can be coded as a Class and on compiler side the corresponding basic type extracted with `java_lang_Class::as_BasicType()`. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.org/jdk/pull/10847 From wkemper at openjdk.org Tue Jan 17 19:42:39 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 17 Jan 2023 19:42:39 GMT Subject: RFR: Do not reset learning cycles after resizing Message-ID: Also, require more than 10 gc cycles before trigger can resize generations. The behavior on the extremem 'phased' workload looks much better: [1334.381s][info][gc,stats ] 66 Successful Concurrent GCs [1334.381s][info][gc,stats ] 0 invoked explicitly [1334.381s][info][gc,stats ] 0 invoked implicitly [1334.381s][info][gc,stats ] [1334.381s][info][gc,stats ] 7 Completed Old GCs [1334.381s][info][gc,stats ] 0 mixed [1334.381s][info][gc,stats ] 0 interruptions [1334.381s][info][gc,stats ] [1334.381s][info][gc,stats ] 1 Degenerated GCs [1334.381s][info][gc,stats ] 1 caused by allocation failure [1334.381s][info][gc,stats ] 1 happened at Mark [1334.381s][info][gc,stats ] 1 upgraded to Full GC [1334.381s][info][gc,stats ] [1334.381s][info][gc,stats ] 0 Abbreviated GCs [1334.381s][info][gc,stats ] [1334.381s][info][gc,stats ] 1 Full GCs [1334.381s][info][gc,stats ] 0 invoked explicitly [1334.381s][info][gc,stats ] 0 invoked implicitly [1334.381s][info][gc,stats ] 0 caused by allocation failure [1334.381s][info][gc,stats ] 1 upgraded from Degenerated GC The full cycle here was the first cycle after the last of the initial learning cycles. ------------- Commit messages: - Require more than 10 gc cycles before trigger can resize generations - Merge branch 'shenandoah-master' into generation-sizing-refinements - Do not reset learning cycles after resizing Changes: https://git.openjdk.org/shenandoah/pull/203/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=203&range=00 Stats: 22 lines in 3 files changed: 16 ins; 3 del; 3 mod Patch: https://git.openjdk.org/shenandoah/pull/203.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/203/head:pull/203 PR: https://git.openjdk.org/shenandoah/pull/203 From wkemper at openjdk.org Tue Jan 17 20:14:13 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 17 Jan 2023 20:14:13 GMT Subject: RFR: 8298647: GenShen require heap size 2MB granularity In-Reply-To: References: Message-ID: On Mon, 16 Jan 2023 21:41:35 GMT, Stuart Monteith wrote: > Generational Shenandoah requires 2MB granularity in order for card tables to cover the allocated heap. Each byte in a page of card table represents 512 heap bytes. As card tables are allocated 4KB at a time, 4KB * 512 = 2MB. > > There is a circular dependency between the region calculations and the heap size calculations. This unconditionally rounds up the heap size to 2MB. It might be preferable to do this only when generational mode is enabled. > > Running with: > java -Xlog:gc*=trace -XX:+UseShenandoahGC -mx495m \ > -XX:ShenandoahGCMode=generational -version > > on a debug build is sufficient to reproduce this problem. Changes requested by wkemper (Committer). src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 728: > 726: } > 727: > 728: // Generational Shenandoah needs this alignment for card tables. Thank you for this fix! It would be nice if this constraint were only applied for generation mode, but these sizes are computed quite earlier during startup. You'd need to factor the code out of `ShenandoahHeap::initialize_heuristics` to know whether the constraint is required at this point. ------------- PR: https://git.openjdk.org/shenandoah/pull/202 From redestad at openjdk.org Tue Jan 17 20:55:08 2023 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 17 Jan 2023 20:55:08 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v21] In-Reply-To: References: Message-ID: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Remove spurious newline ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/ffe5b66d..48c068bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=19-20 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Tue Jan 17 20:55:12 2023 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 17 Jan 2023 20:55:12 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v20] In-Reply-To: References: Message-ID: On Tue, 17 Jan 2023 18:46:00 GMT, Vladimir Ivanov wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> trailing ws > > src/hotspot/share/opto/machnode.cpp line 211: > >> 209: opcnt++; // Bump operand count >> 210: assert( opcnt < numopnds, "Accessing non-existent operand" ); >> 211: > > A leftover from a previous change? Fixed ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Tue Jan 17 21:06:01 2023 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 17 Jan 2023 21:06:01 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v22] In-Reply-To: References: Message-ID: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 76 commits: - Copyrights - Merge branch 'master' into 8282664-polyhash - Remove spurious newline - trailing ws - Change signature to offset + length, add sanity test - Adapt end input to len (fix latent bug with sub-ranges - Clean-up types, simplify, hoist special-casing of String variants from arrays_hashcode, add initial value and range to intrinsic - Explicitly lea external address - Merge branch 'master' into 8282664-polyhash - Treat Op_VectorizedHashCode as other similar Ops in split_unique_types - ... and 66 more: https://git.openjdk.org/jdk/compare/ade08e19...7e6080b6 ------------- Changes: https://git.openjdk.org/jdk/pull/10847/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=21 Stats: 1062 lines in 33 files changed: 975 ins; 9 del; 78 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Tue Jan 17 21:09:56 2023 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 17 Jan 2023 21:09:56 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v21] In-Reply-To: References: Message-ID: On Tue, 17 Jan 2023 20:55:08 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Remove spurious newline Thanks for your patience and reviews. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Tue Jan 17 21:09:58 2023 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 17 Jan 2023 21:09:58 GMT Subject: Integrated: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops In-Reply-To: References: Message-ID: On Tue, 25 Oct 2022 10:37:40 GMT, Claes Redestad wrote: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. This pull request has now been integrated. Changeset: e37078f5 Author: Claes Redestad URL: https://git.openjdk.org/jdk/commit/e37078f5bb626c7ce0348a38bb86ca2ca62ba915 Stats: 1062 lines in 33 files changed: 975 ins; 9 del; 78 mod 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops Co-authored-by: Sandhya Viswanathan Co-authored-by: Ludovic Henry Co-authored-by: Claes Redestad Reviewed-by: vlivanov, sviswanathan, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/10847 From ysr at openjdk.org Tue Jan 17 23:36:45 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 17 Jan 2023 23:36:45 GMT Subject: RFR: Do not reset learning cycles after resizing In-Reply-To: References: Message-ID: On Tue, 17 Jan 2023 19:35:40 GMT, William Kemper wrote: > Also, require more than 10 gc cycles before trigger can resize generations. The behavior on the extremem 'phased' workload looks much better: > > [1334.381s][info][gc,stats ] 66 Successful Concurrent GCs > [1334.381s][info][gc,stats ] 0 invoked explicitly > [1334.381s][info][gc,stats ] 0 invoked implicitly > [1334.381s][info][gc,stats ] > [1334.381s][info][gc,stats ] 7 Completed Old GCs > [1334.381s][info][gc,stats ] 0 mixed > [1334.381s][info][gc,stats ] 0 interruptions > [1334.381s][info][gc,stats ] > [1334.381s][info][gc,stats ] 1 Degenerated GCs > [1334.381s][info][gc,stats ] 1 caused by allocation failure > [1334.381s][info][gc,stats ] 1 happened at Mark > [1334.381s][info][gc,stats ] 1 upgraded to Full GC > [1334.381s][info][gc,stats ] > [1334.381s][info][gc,stats ] 0 Abbreviated GCs > [1334.381s][info][gc,stats ] > [1334.381s][info][gc,stats ] 1 Full GCs > [1334.381s][info][gc,stats ] 0 invoked explicitly > [1334.381s][info][gc,stats ] 0 invoked implicitly > [1334.381s][info][gc,stats ] 0 caused by allocation failure > [1334.381s][info][gc,stats ] 1 upgraded from Degenerated GC > > The full cycle here was the first cycle after the last of the initial learning cycles. LGTM, reviewed. But curious if you noticed any difference wrt, e.g., specjbb. Some more thoughts: I wonder if the # of cycles to wait would be proportional to the long-run ratio of minor to major collection cycles. I agree though that waiting about 10 cycles between resizing decisions would have the salubrious effect of smoothing out any temporary spikes. Somewhat relatedly, and something that I only vaguely paid attention to before: What's the default decay factor for the MMU decaying average, and what constitutes an MMU sample: the occurrence of a GC (minor or major), or just a synchronous 5-second sample of both (which might decay very quickly go to 100%, losing almost all the information in the signal after 6-7 samples, i.e. 30-35 seconds in this case, unless GC's were happening at a fast clip). ------------- Marked as reviewed by ysr (Author). PR: https://git.openjdk.org/shenandoah/pull/203 From kdnilsen at openjdk.org Tue Jan 17 23:46:29 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 17 Jan 2023 23:46:29 GMT Subject: RFR: Do not reset learning cycles after resizing In-Reply-To: References: Message-ID: On Tue, 17 Jan 2023 19:35:40 GMT, William Kemper wrote: > Also, require more than 10 gc cycles before trigger can resize generations. The behavior on the extremem 'phased' workload looks much better: > > [1334.381s][info][gc,stats ] 66 Successful Concurrent GCs > [1334.381s][info][gc,stats ] 0 invoked explicitly > [1334.381s][info][gc,stats ] 0 invoked implicitly > [1334.381s][info][gc,stats ] > [1334.381s][info][gc,stats ] 7 Completed Old GCs > [1334.381s][info][gc,stats ] 0 mixed > [1334.381s][info][gc,stats ] 0 interruptions > [1334.381s][info][gc,stats ] > [1334.381s][info][gc,stats ] 1 Degenerated GCs > [1334.381s][info][gc,stats ] 1 caused by allocation failure > [1334.381s][info][gc,stats ] 1 happened at Mark > [1334.381s][info][gc,stats ] 1 upgraded to Full GC > [1334.381s][info][gc,stats ] > [1334.381s][info][gc,stats ] 0 Abbreviated GCs > [1334.381s][info][gc,stats ] > [1334.381s][info][gc,stats ] 1 Full GCs > [1334.381s][info][gc,stats ] 0 invoked explicitly > [1334.381s][info][gc,stats ] 0 invoked implicitly > [1334.381s][info][gc,stats ] 0 caused by allocation failure > [1334.381s][info][gc,stats ] 1 upgraded from Degenerated GC > > The full cycle here was the first cycle after the last of the initial learning cycles. Thanks. This looks like a very good improvement. (I assume we are separately looking into the case where the MMU triggers cause young to expand even when a more enlightened perspective would instead expand OLD.) ------------- Marked as reviewed by kdnilsen (Committer). PR: https://git.openjdk.org/shenandoah/pull/203 From wkemper at openjdk.org Wed Jan 18 00:06:29 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 18 Jan 2023 00:06:29 GMT Subject: RFR: Do not reset learning cycles after resizing In-Reply-To: References: Message-ID: On Tue, 17 Jan 2023 23:27:24 GMT, Y. Srinivas Ramakrishna wrote: >> Also, require more than 10 gc cycles before trigger can resize generations. The behavior on the extremem 'phased' workload looks much better: >> >> [1334.381s][info][gc,stats ] 66 Successful Concurrent GCs >> [1334.381s][info][gc,stats ] 0 invoked explicitly >> [1334.381s][info][gc,stats ] 0 invoked implicitly >> [1334.381s][info][gc,stats ] >> [1334.381s][info][gc,stats ] 7 Completed Old GCs >> [1334.381s][info][gc,stats ] 0 mixed >> [1334.381s][info][gc,stats ] 0 interruptions >> [1334.381s][info][gc,stats ] >> [1334.381s][info][gc,stats ] 1 Degenerated GCs >> [1334.381s][info][gc,stats ] 1 caused by allocation failure >> [1334.381s][info][gc,stats ] 1 happened at Mark >> [1334.381s][info][gc,stats ] 1 upgraded to Full GC >> [1334.381s][info][gc,stats ] >> [1334.381s][info][gc,stats ] 0 Abbreviated GCs >> [1334.381s][info][gc,stats ] >> [1334.381s][info][gc,stats ] 1 Full GCs >> [1334.381s][info][gc,stats ] 0 invoked explicitly >> [1334.381s][info][gc,stats ] 0 invoked implicitly >> [1334.381s][info][gc,stats ] 0 caused by allocation failure >> [1334.381s][info][gc,stats ] 1 upgraded from Degenerated GC >> >> The full cycle here was the first cycle after the last of the initial learning cycles. > > LGTM, reviewed. But curious if you noticed any difference wrt, e.g., specjbb. > > Some more thoughts: > I wonder if the # of cycles to wait would be proportional to the long-run ratio of minor to major collection cycles. > > I agree though that waiting about 10 cycles between resizing decisions would have the salubrious effect of smoothing out any temporary spikes. Somewhat relatedly, and something that I only vaguely paid attention to before: What's the default decay factor for the MMU decaying average, and what constitutes an MMU sample: the occurrence of a GC (minor or major), or just a synchronous 5-second sample of both (which might decay very quickly go to 100%, losing almost all the information in the signal after 6-7 samples, i.e. 30-35 seconds in this case, unless GC's were happening at a fast clip). @ysramakrishna , I didn't notice the trouble with these learning cycles originally because on specjbb, the heuristic quickly maxes out the young generation size and keeps it there. The decay factor is set by `ShenandoahAdaptiveDecayFactor` (default is 0.5) and the MMU is updated every `GCPauseIntervalMillis` (default is 5 seconds). @kdnilsen , I will look for workloads that defeat the heuristic. It's much better behaved now on the extremem 'phased' workload. Perhaps heapothesys with a high occupancy rate? ------------- PR: https://git.openjdk.org/shenandoah/pull/203 From wkemper at openjdk.org Wed Jan 18 00:09:35 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 18 Jan 2023 00:09:35 GMT Subject: Integrated: Do not reset learning cycles after resizing In-Reply-To: References: Message-ID: On Tue, 17 Jan 2023 19:35:40 GMT, William Kemper wrote: > Also, require more than 10 gc cycles before trigger can resize generations. The behavior on the extremem 'phased' workload looks much better: > > [1334.381s][info][gc,stats ] 66 Successful Concurrent GCs > [1334.381s][info][gc,stats ] 0 invoked explicitly > [1334.381s][info][gc,stats ] 0 invoked implicitly > [1334.381s][info][gc,stats ] > [1334.381s][info][gc,stats ] 7 Completed Old GCs > [1334.381s][info][gc,stats ] 0 mixed > [1334.381s][info][gc,stats ] 0 interruptions > [1334.381s][info][gc,stats ] > [1334.381s][info][gc,stats ] 1 Degenerated GCs > [1334.381s][info][gc,stats ] 1 caused by allocation failure > [1334.381s][info][gc,stats ] 1 happened at Mark > [1334.381s][info][gc,stats ] 1 upgraded to Full GC > [1334.381s][info][gc,stats ] > [1334.381s][info][gc,stats ] 0 Abbreviated GCs > [1334.381s][info][gc,stats ] > [1334.381s][info][gc,stats ] 1 Full GCs > [1334.381s][info][gc,stats ] 0 invoked explicitly > [1334.381s][info][gc,stats ] 0 invoked implicitly > [1334.381s][info][gc,stats ] 0 caused by allocation failure > [1334.381s][info][gc,stats ] 1 upgraded from Degenerated GC > > The full cycle here was the first cycle after the last of the initial learning cycles. This pull request has now been integrated. Changeset: 2cab4e76 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/2cab4e76083743f6eabf427bfc5ec8c4b3eaf081 Stats: 22 lines in 3 files changed: 16 ins; 3 del; 3 mod Do not reset learning cycles after resizing Reviewed-by: ysr, kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/203 From luhenry at openjdk.org Wed Jan 18 08:37:46 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 18 Jan 2023 08:37:46 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v22] In-Reply-To: References: Message-ID: On Tue, 17 Jan 2023 21:06:01 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 76 commits: > > - Copyrights > - Merge branch 'master' into 8282664-polyhash > - Remove spurious newline > - trailing ws > - Change signature to offset + length, add sanity test > - Adapt end input to len (fix latent bug with sub-ranges > - Clean-up types, simplify, hoist special-casing of String variants from arrays_hashcode, add initial value and range to intrinsic > - Explicitly lea external address > - Merge branch 'master' into 8282664-polyhash > - Treat Op_VectorizedHashCode as other similar Ops in split_unique_types > - ... and 66 more: https://git.openjdk.org/jdk/compare/ade08e19...7e6080b6 Thanks for pushing it all the way! ------------- PR: https://git.openjdk.org/jdk/pull/10847 From smonteith at openjdk.org Wed Jan 18 09:22:35 2023 From: smonteith at openjdk.org (Stuart Monteith) Date: Wed, 18 Jan 2023 09:22:35 GMT Subject: RFR: 8298647: GenShen require heap size 2MB granularity In-Reply-To: References: Message-ID: On Tue, 17 Jan 2023 20:11:42 GMT, William Kemper wrote: >> Generational Shenandoah requires 2MB granularity in order for card tables to cover the allocated heap. Each byte in a page of card table represents 512 heap bytes. As card tables are allocated 4KB at a time, 4KB * 512 = 2MB. >> >> There is a circular dependency between the region calculations and the heap size calculations. This unconditionally rounds up the heap size to 2MB. It might be preferable to do this only when generational mode is enabled. >> >> Running with: >> java -Xlog:gc*=trace -XX:+UseShenandoahGC -mx495m \ >> -XX:ShenandoahGCMode=generational -version >> >> on a debug build is sufficient to reproduce this problem. > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 728: > >> 726: } >> 727: >> 728: // Generational Shenandoah needs this alignment for card tables. > > Thank you for this fix! It would be nice if this constraint were only applied for generation mode, but these sizes are computed quite earlier during startup. You'd need to factor the code out of `ShenandoahHeap::initialize_heuristics` to know whether the constraint is required at this point. Yes, I thought I'd start with a simple fix to highlight the problem first. I experimented with exactly what you suggested, parsing ShenandoahGCMode. I can update this the 2MB alignment isn't desired unconditionally. ------------- PR: https://git.openjdk.org/shenandoah/pull/202 From redestad at openjdk.org Wed Jan 18 09:16:47 2023 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 18 Jan 2023 09:16:47 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v22] In-Reply-To: References: Message-ID: On Tue, 17 Jan 2023 21:06:01 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 76 commits: > > - Copyrights > - Merge branch 'master' into 8282664-polyhash > - Remove spurious newline > - trailing ws > - Change signature to offset + length, add sanity test > - Adapt end input to len (fix latent bug with sub-ranges > - Clean-up types, simplify, hoist special-casing of String variants from arrays_hashcode, add initial value and range to intrinsic > - Explicitly lea external address > - Merge branch 'master' into 8282664-polyhash > - Treat Op_VectorizedHashCode as other similar Ops in split_unique_types > - ... and 66 more: https://git.openjdk.org/jdk/compare/ade08e19...7e6080b6 Filed https://bugs.openjdk.org/browse/JDK-8300448 to follow-up and rewrite part of or all of the inlined code as a stub call. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From smonteith at openjdk.org Wed Jan 18 10:40:29 2023 From: smonteith at openjdk.org (Stuart Monteith) Date: Wed, 18 Jan 2023 10:40:29 GMT Subject: RFR: 8298647: GenShen require heap size 2MB granularity [v2] In-Reply-To: References: Message-ID: > Generational Shenandoah requires 2MB granularity in order for card tables to cover the allocated heap. Each byte in a page of card table represents 512 heap bytes. As card tables are allocated 4KB at a time, 4KB * 512 = 2MB. > > There is a circular dependency between the region calculations and the heap size calculations. This unconditionally rounds up the heap size to 2MB. It might be preferable to do this only when generational mode is enabled. > > Running with: > java -Xlog:gc*=trace -XX:+UseShenandoahGC -mx495m \ > -XX:ShenandoahGCMode=generational -version > > on a debug build is sufficient to reproduce this problem. Stuart Monteith has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8298647 - 8298647: GenShen require heap size 2MB granularity Generational Shenandoah requires 2MB granularity in order for card tables to cover the allocated heap. Each byte in a page of card table represents 512 heap bytes. As card tables are allocated 4KB at a time, 4KB * 512 = 2MB. There is a circular dependency between the region calculations and the heap size calculations. This unconditionally rounds up the heap size to 2MB. It might be preferable to do this only when generational mode is enabled. Running with: java -Xlog:gc*=trace -XX:+UseShenandoahGC -mx495m \ -XX:ShenandoahGCMode=generational -version on a debug build is sufficient to reproduce this problem. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/202/files - new: https://git.openjdk.org/shenandoah/pull/202/files/571fe68a..96d4347d Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=202&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=202&range=00-01 Stats: 7407 lines in 420 files changed: 4387 ins; 1648 del; 1372 mod Patch: https://git.openjdk.org/shenandoah/pull/202.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/202/head:pull/202 PR: https://git.openjdk.org/shenandoah/pull/202 From wkemper at openjdk.org Wed Jan 18 17:20:06 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 18 Jan 2023 17:20:06 GMT Subject: RFR: 8298647: GenShen require heap size 2MB granularity [v2] In-Reply-To: References: Message-ID: <5DZUQXosMLpyLUKK3X6eQcqG7ibx3DEKdFTOQoriuag=.e334d978-20d4-49a2-9f29-4b10d9820af4@github.com> On Wed, 18 Jan 2023 09:19:40 GMT, Stuart Monteith wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 728: >> >>> 726: } >>> 727: >>> 728: // Generational Shenandoah needs this alignment for card tables. >> >> Thank you for this fix! It would be nice if this constraint were only applied for generation mode, but these sizes are computed quite earlier during startup. You'd need to factor the code out of `ShenandoahHeap::initialize_heuristics` to know whether the constraint is required at this point. > > Yes, I thought I'd start with a simple fix to highlight the problem first. I experimented with exactly what you suggested, parsing ShenandoahGCMode. I can update this the 2MB alignment isn't desired unconditionally. Yes please. We've tried to limit the impact of the generational mode on Shenandoah's other modes. ------------- PR: https://git.openjdk.org/shenandoah/pull/202 From eosterlund at openjdk.org Thu Jan 19 10:35:12 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 19 Jan 2023 10:35:12 GMT Subject: RFR: 8300644: Remove gc/shenandoah/jni/TestStringCriticalWithDedup.java Message-ID: The gc/shenandoah/jni/TestStringCriticalWithDedup.java test was designed to catch failure to pin strings being passed out to JNI critical users, because that used to be dangerous. After [JDK-8299673](https://bugs.openjdk.org/browse/JDK-8299673) that is not dangerous any longer. Conversely, now we kind of want deduplication to proceed regardless of JNI critical, which defeats the purpose of this test. It should be removed. ------------- Commit messages: - 8300644: Remove gc/shenandoah/jni/TestStringCriticalWithDedup.java Changes: https://git.openjdk.org/jdk/pull/12089/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12089&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8300644 Stats: 167 lines in 1 file changed: 0 ins; 167 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12089.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12089/head:pull/12089 PR: https://git.openjdk.org/jdk/pull/12089 From duke at openjdk.org Thu Jan 19 11:06:46 2023 From: duke at openjdk.org (Afshin Zafari) Date: Thu, 19 Jan 2023 11:06:46 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values Message-ID: ### Description os::allocation_granularity/page_size and friends return signed values ### Patch - Type of `vm_page_size` and `vm_allocation_granularity` members of `OSInfo` class and their wrappers in `os` class changed to `size_t` - Initial value of them changed from -1 to 0. - In setters, checking for *set only once* condition is updated accordingly (comparing with 0 instead of -1). - Equal to 0 (instead of `<= 0` ) is used to check if calling setters failed. - All `(size_t)` casting of getters removed. - In arithmetic and negation operations, the operand related to the getters casted to `(int)`. Otherwise, the Windows builds complain. - Explicitly casted to `(int)` where `jint` needed. - In ` align_up(T size, A alignment)`, assignment of variables of type `A` to type `T` (i.e., `T t = (A) a;`) should be safe. `T : size_t` and `A : int` won't compile. Fixed appropriately. - `"%d"` format-flags replaced with `SIZE_FORMAT`. - Type of `CompilerToVM::Data::vm_page_size` changed to `size_t`. ### Test tier1-5: all green, except an unrelated fail for whom a bug is already created. job-id: afshin-8151413-20230117-1255-40910454 ------------- Commit messages: - 8151413: os::allocation_granularity/page_size and friends return signed values Changes: https://git.openjdk.org/jdk/pull/12091/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12091&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8151413 Stats: 129 lines in 62 files changed: 0 ins; 0 del; 129 mod Patch: https://git.openjdk.org/jdk/pull/12091.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12091/head:pull/12091 PR: https://git.openjdk.org/jdk/pull/12091 From stefank at openjdk.org Thu Jan 19 11:43:34 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Jan 2023 11:43:34 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values In-Reply-To: References: Message-ID: <3NiiB066CB1zaeStjXuFQNdH1ZHdiubvmNkjoWqqgLg=.49dab42f-062a-4c42-b0e2-7b36f6c55fc9@github.com> On Thu, 19 Jan 2023 10:59:02 GMT, Afshin Zafari wrote: > ### Description > os::allocation_granularity/page_size and friends return signed values > > ### Patch > - Type of `vm_page_size` and `vm_allocation_granularity` members of `OSInfo` class and their wrappers in `os` class changed to `size_t` > - Initial value of them changed from -1 to 0. > - In setters, checking for *set only once* condition is updated accordingly (comparing with 0 instead of -1). > - Equal to 0 (instead of `<= 0` ) is used to check if calling setters failed. > - All `(size_t)` casting of getters removed. > - In arithmetic and negation operations, the operand related to the getters casted to `(int)`. Otherwise, the Windows builds complain. > - Explicitly casted to `(int)` where `jint` needed. > - In ` align_up(T size, A alignment)`, assignment of variables of type `A` to type `T` (i.e., `T t = (A) a;`) should be safe. `T : size_t` and `A : int` won't compile. Fixed appropriately. > - `"%d"` format-flags replaced with `SIZE_FORMAT`. > - Type of `CompilerToVM::Data::vm_page_size` changed to `size_t`. > > ### Test > tier1-5: all green, except an unrelated fail for whom a bug is already created. > job-id: afshin-8151413-20230117-1255-40910454 I think this mostly looks good. I've one thing I'd like to get checked. src/hotspot/os/linux/os_linux.cpp line 4285: > 4283: clock_tics_per_sec = sysconf(_SC_CLK_TCK); > 4284: > 4285: size_t page_size = (size_t) sysconf(_SC_PAGESIZE); This cast voids the check for negative return values below. Maybe check this value first, then cast it to a size_t? ------------- Changes requested by stefank (Reviewer). PR: https://git.openjdk.org/jdk/pull/12091 From duke at openjdk.org Thu Jan 19 12:46:42 2023 From: duke at openjdk.org (Afshin Zafari) Date: Thu, 19 Jan 2023 12:46:42 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values [v2] In-Reply-To: References: Message-ID: > ### Description > os::allocation_granularity/page_size and friends return signed values > > ### Patch > - Type of `vm_page_size` and `vm_allocation_granularity` members of `OSInfo` class and their wrappers in `os` class changed to `size_t` > - Initial value of them changed from -1 to 0. > - In setters, checking for *set only once* condition is updated accordingly (comparing with 0 instead of -1). Also, checking the argument be positive is removed. > - Equal to 0 (instead of `<= 0` ) is used to check if calling setters failed. > - All `(size_t)` casting of getters removed. > - In arithmetic and negation operations, the operand related to the getters casted to `(int)`. Otherwise, the Windows builds complain. > - Explicitly casted to `(int)` where `jint` needed. > - In ` align_up(T size, A alignment)`, assignment of variables of type `A` to type `T` (i.e., `T t = (A) a;`) should be safe. `T : size_t` and `A : int` won't compile. Fixed appropriately. > - `"%d"` format-flags replaced with `SIZE_FORMAT`. > - Type of `CompilerToVM::Data::vm_page_size` changed to `size_t`. > > ### Test > tier1-5: all green, except an unrelated fail for whom a bug is already created. > job-id: afshin-8151413-20230117-1255-40910454 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8151413: os::allocation_granularity/page_size and friends return signed values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12091/files - new: https://git.openjdk.org/jdk/pull/12091/files/97670c79..74c859b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12091&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12091&range=00-01 Stats: 10 lines in 2 files changed: 4 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12091.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12091/head:pull/12091 PR: https://git.openjdk.org/jdk/pull/12091 From stefank at openjdk.org Thu Jan 19 13:16:02 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Jan 2023 13:16:02 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values [v2] In-Reply-To: References: Message-ID: On Thu, 19 Jan 2023 12:46:42 GMT, Afshin Zafari wrote: >> ### Description >> os::allocation_granularity/page_size and friends return signed values >> >> ### Patch >> - Type of `vm_page_size` and `vm_allocation_granularity` members of `OSInfo` class and their wrappers in `os` class changed to `size_t` >> - Initial value of them changed from -1 to 0. >> - In setters, checking for *set only once* condition is updated accordingly (comparing with 0 instead of -1). Also, checking the argument be positive is removed. >> - Equal to 0 (instead of `<= 0` ) is used to check if calling setters failed. >> - All `(size_t)` casting of getters removed. >> - In arithmetic and negation operations, the operand related to the getters casted to `(int)`. Otherwise, the Windows builds complain. >> - Explicitly casted to `(int)` where `jint` needed. >> - In ` align_up(T size, A alignment)`, assignment of variables of type `A` to type `T` (i.e., `T t = (A) a;`) should be safe. `T : size_t` and `A : int` won't compile. Fixed appropriately. >> - `"%d"` format-flags replaced with `SIZE_FORMAT`. >> - Type of `CompilerToVM::Data::vm_page_size` changed to `size_t`. >> >> ### Test >> tier1-5: all green, except an unrelated fail for whom a bug is already created. >> job-id: afshin-8151413-20230117-1255-40910454 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8151413: os::allocation_granularity/page_size and friends return signed values Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12091 From duke at openjdk.org Thu Jan 19 19:00:30 2023 From: duke at openjdk.org (Afshin Zafari) Date: Thu, 19 Jan 2023 19:00:30 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values [v3] In-Reply-To: References: Message-ID: <5ZV3GHTMWBLxQ1UHVC3hT4rhjMx2clU_QbaPFORdTSM=.8ac9810a-48e8-4639-8616-b5ce7f340f17@github.com> > ### Description > os::allocation_granularity/page_size and friends return signed values > > ### Patch > - Type of `vm_page_size` and `vm_allocation_granularity` members of `OSInfo` class and their wrappers in `os` class changed to `size_t` > - Initial value of them changed from -1 to 0. > - In setters, checking for *set only once* condition is updated accordingly (comparing with 0 instead of -1). Also, checking the argument be positive is removed. > - Equal to 0 (instead of `<= 0` ) is used to check if calling setters failed. > - All `(size_t)` casting of getters removed. > - In arithmetic and negation operations, the operand related to the getters casted to `(int)`. Otherwise, the Windows builds complain. > - Explicitly casted to `(int)` where `jint` needed. > - In ` align_up(T size, A alignment)`, assignment of variables of type `A` to type `T` (i.e., `T t = (A) a;`) should be safe. `T : size_t` and `A : int` won't compile. Fixed appropriately. > - `"%d"` format-flags replaced with `SIZE_FORMAT`. > - Type of `CompilerToVM::Data::vm_page_size` changed to `size_t`. > > ### Test > tier1-5: all green, except an unrelated fail for whom a bug is already created. > job-id: afshin-8151413-20230117-1255-40910454 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8151413: os::allocation_granularity/page_size and friends return signed values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12091/files - new: https://git.openjdk.org/jdk/pull/12091/files/74c859b7..7b9c0361 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12091&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12091&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12091.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12091/head:pull/12091 PR: https://git.openjdk.org/jdk/pull/12091 From ccheung at openjdk.org Thu Jan 19 19:32:13 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Thu, 19 Jan 2023 19:32:13 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values [v3] In-Reply-To: <5ZV3GHTMWBLxQ1UHVC3hT4rhjMx2clU_QbaPFORdTSM=.8ac9810a-48e8-4639-8616-b5ce7f340f17@github.com> References: <5ZV3GHTMWBLxQ1UHVC3hT4rhjMx2clU_QbaPFORdTSM=.8ac9810a-48e8-4639-8616-b5ce7f340f17@github.com> Message-ID: On Thu, 19 Jan 2023 19:00:30 GMT, Afshin Zafari wrote: >> ### Description >> os::allocation_granularity/page_size and friends return signed values >> >> ### Patch >> - Type of `vm_page_size` and `vm_allocation_granularity` members of `OSInfo` class and their wrappers in `os` class changed to `size_t` >> - Initial value of them changed from -1 to 0. >> - In setters, checking for *set only once* condition is updated accordingly (comparing with 0 instead of -1). Also, checking the argument be positive is removed. >> - Equal to 0 (instead of `<= 0` ) is used to check if calling setters failed. >> - All `(size_t)` casting of getters removed. >> - In arithmetic and negation operations, the operand related to the getters casted to `(int)`. Otherwise, the Windows builds complain. >> - Explicitly casted to `(int)` where `jint` needed. >> - In ` align_up(T size, A alignment)`, assignment of variables of type `A` to type `T` (i.e., `T t = (A) a;`) should be safe. `T : size_t` and `A : int` won't compile. Fixed appropriately. >> - `"%d"` format-flags replaced with `SIZE_FORMAT`. >> - Type of `CompilerToVM::Data::vm_page_size` changed to `size_t`. >> >> ### Test >> tier1-5: all green, except an unrelated fail for whom a bug is already created. >> job-id: afshin-8151413-20230117-1255-40910454 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8151413: os::allocation_granularity/page_size and friends return signed values I have a suggestion below. Also, you may want to update the copyright year of some of the files. The `make/scripts/update_copyright_year.sh` can help on that. src/hotspot/share/prims/whitebox.cpp line 160: > 158: > 159: WB_ENTRY(jint, WB_GetVMPageSize(JNIEnv* env, jobject o)) > 160: return (int)os::vm_page_size(); Maybe typecast it to `jint`? ------------- PR: https://git.openjdk.org/jdk/pull/12091 From ysr at openjdk.org Thu Jan 19 22:54:27 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 19 Jan 2023 22:54:27 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values [v3] In-Reply-To: <5ZV3GHTMWBLxQ1UHVC3hT4rhjMx2clU_QbaPFORdTSM=.8ac9810a-48e8-4639-8616-b5ce7f340f17@github.com> References: <5ZV3GHTMWBLxQ1UHVC3hT4rhjMx2clU_QbaPFORdTSM=.8ac9810a-48e8-4639-8616-b5ce7f340f17@github.com> Message-ID: On Thu, 19 Jan 2023 19:00:30 GMT, Afshin Zafari wrote: >> ### Description >> os::allocation_granularity/page_size and friends return signed values >> >> ### Patch >> - Type of `vm_page_size` and `vm_allocation_granularity` members of `OSInfo` class and their wrappers in `os` class changed to `size_t` >> - Initial value of them changed from -1 to 0. >> - In setters, checking for *set only once* condition is updated accordingly (comparing with 0 instead of -1). Also, checking the argument be positive is removed. >> - Equal to 0 (instead of `<= 0` ) is used to check if calling setters failed. >> - All `(size_t)` casting of getters removed. >> - In arithmetic and negation operations, the operand related to the getters casted to `(int)`. Otherwise, the Windows builds complain. >> - Explicitly casted to `(int)` where `jint` needed. >> - In ` align_up(T size, A alignment)`, assignment of variables of type `A` to type `T` (i.e., `T t = (A) a;`) should be safe. `T : size_t` and `A : int` won't compile. Fixed appropriately. >> - `"%d"` format-flags replaced with `SIZE_FORMAT`. >> - Type of `CompilerToVM::Data::vm_page_size` changed to `size_t`. >> >> ### Test >> tier1-5: all green, except an unrelated fail for whom a bug is already created. >> job-id: afshin-8151413-20230117-1255-40910454 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8151413: os::allocation_granularity/page_size and friends return signed values LGTM! ------------- Marked as reviewed by ysr (Reviewer). PR: https://git.openjdk.org/jdk/pull/12091 From duke at openjdk.org Fri Jan 20 09:28:58 2023 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 20 Jan 2023 09:28:58 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values [v4] In-Reply-To: References: Message-ID: > ### Description > os::allocation_granularity/page_size and friends return signed values > > ### Patch > - Type of `vm_page_size` and `vm_allocation_granularity` members of `OSInfo` class and their wrappers in `os` class changed to `size_t` > - Initial value of them changed from -1 to 0. > - In setters, checking for *set only once* condition is updated accordingly (comparing with 0 instead of -1). Also, checking the argument be positive is removed. > - Equal to 0 (instead of `<= 0` ) is used to check if calling setters failed. > - All `(size_t)` casting of getters removed. > - In arithmetic and negation operations, the operand related to the getters casted to `(int)`. Otherwise, the Windows builds complain. > - Explicitly casted to `(int)` where `jint` needed. > - In ` align_up(T size, A alignment)`, assignment of variables of type `A` to type `T` (i.e., `T t = (A) a;`) should be safe. `T : size_t` and `A : int` won't compile. Fixed appropriately. > - `"%d"` format-flags replaced with `SIZE_FORMAT`. > - Type of `CompilerToVM::Data::vm_page_size` changed to `size_t`. > > ### Test > tier1-5: all green, except an unrelated fail for whom a bug is already created. > job-id: afshin-8151413-20230117-1255-40910454 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8151413: os::allocation_granularity/page_size and friends return signed values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12091/files - new: https://git.openjdk.org/jdk/pull/12091/files/7b9c0361..ec3e4129 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12091&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12091&range=02-03 Stats: 55 lines in 54 files changed: 0 ins; 0 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/12091.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12091/head:pull/12091 PR: https://git.openjdk.org/jdk/pull/12091 From duke at openjdk.org Fri Jan 20 09:34:17 2023 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 20 Jan 2023 09:34:17 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values [v5] In-Reply-To: References: Message-ID: > ### Description > os::allocation_granularity/page_size and friends return signed values > > ### Patch > - Type of `vm_page_size` and `vm_allocation_granularity` members of `OSInfo` class and their wrappers in `os` class changed to `size_t` > - Initial value of them changed from -1 to 0. > - In setters, checking for *set only once* condition is updated accordingly (comparing with 0 instead of -1). Also, checking the argument be positive is removed. > - Equal to 0 (instead of `<= 0` ) is used to check if calling setters failed. > - All `(size_t)` casting of getters removed. > - In arithmetic and negation operations, the operand related to the getters casted to `(int)`. Otherwise, the Windows builds complain. > - Explicitly casted to `(int)` where `jint` needed. > - In ` align_up(T size, A alignment)`, assignment of variables of type `A` to type `T` (i.e., `T t = (A) a;`) should be safe. `T : size_t` and `A : int` won't compile. Fixed appropriately. > - `"%d"` format-flags replaced with `SIZE_FORMAT`. > - Type of `CompilerToVM::Data::vm_page_size` changed to `size_t`. > > ### Test > tier1-5: all green, except an unrelated fail for whom a bug is already created. > job-id: afshin-8151413-20230117-1255-40910454 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8151413: os::allocation_granularity/page_size and friends return signed values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12091/files - new: https://git.openjdk.org/jdk/pull/12091/files/ec3e4129..2a2a22b3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12091&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12091&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12091.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12091/head:pull/12091 PR: https://git.openjdk.org/jdk/pull/12091 From duke at openjdk.org Fri Jan 20 09:41:28 2023 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 20 Jan 2023 09:41:28 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values [v6] In-Reply-To: References: Message-ID: > ### Description > os::allocation_granularity/page_size and friends return signed values > > ### Patch > - Type of `vm_page_size` and `vm_allocation_granularity` members of `OSInfo` class and their wrappers in `os` class changed to `size_t` > - Initial value of them changed from -1 to 0. > - In setters, checking for *set only once* condition is updated accordingly (comparing with 0 instead of -1). Also, checking the argument be positive is removed. > - Equal to 0 (instead of `<= 0` ) is used to check if calling setters failed. > - All `(size_t)` casting of getters removed. > - In arithmetic and negation operations, the operand related to the getters casted to `(int)`. Otherwise, the Windows builds complain. > - Explicitly casted to `(int)` where `jint` needed. > - In ` align_up(T size, A alignment)`, assignment of variables of type `A` to type `T` (i.e., `T t = (A) a;`) should be safe. `T : size_t` and `A : int` won't compile. Fixed appropriately. > - `"%d"` format-flags replaced with `SIZE_FORMAT`. > - Type of `CompilerToVM::Data::vm_page_size` changed to `size_t`. > > ### Test > tier1-5: all green, except an unrelated fail for whom a bug is already created. > job-id: afshin-8151413-20230117-1255-40910454 Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge 'master' - 8151413: os::allocation_granularity/page_size and friends return signed values - 8151413: os::allocation_granularity/page_size and friends return signed values - 8151413: os::allocation_granularity/page_size and friends return signed values - 8151413: os::allocation_granularity/page_size and friends return signed values - 8151413: os::allocation_granularity/page_size and friends return signed values ------------- Changes: https://git.openjdk.org/jdk/pull/12091/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12091&range=05 Stats: 188 lines in 66 files changed: 7 ins; 5 del; 176 mod Patch: https://git.openjdk.org/jdk/pull/12091.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12091/head:pull/12091 PR: https://git.openjdk.org/jdk/pull/12091 From ccheung at openjdk.org Fri Jan 20 17:33:36 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 20 Jan 2023 17:33:36 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values [v6] In-Reply-To: References: Message-ID: On Fri, 20 Jan 2023 09:41:28 GMT, Afshin Zafari wrote: >> ### Description >> os::allocation_granularity/page_size and friends return signed values >> >> ### Patch >> - Type of `vm_page_size` and `vm_allocation_granularity` members of `OSInfo` class and their wrappers in `os` class changed to `size_t` >> - Initial value of them changed from -1 to 0. >> - In setters, checking for *set only once* condition is updated accordingly (comparing with 0 instead of -1). Also, checking the argument be positive is removed. >> - Equal to 0 (instead of `<= 0` ) is used to check if calling setters failed. >> - All `(size_t)` casting of getters removed. >> - In arithmetic and negation operations, the operand related to the getters casted to `(int)`. Otherwise, the Windows builds complain. >> - Explicitly casted to `(int)` where `jint` needed. >> - In ` align_up(T size, A alignment)`, assignment of variables of type `A` to type `T` (i.e., `T t = (A) a;`) should be safe. `T : size_t` and `A : int` won't compile. Fixed appropriately. >> - `"%d"` format-flags replaced with `SIZE_FORMAT`. >> - Type of `CompilerToVM::Data::vm_page_size` changed to `size_t`. >> >> ### Test >> tier1-5: all green, except an unrelated fail for whom a bug is already created. >> job-id: afshin-8151413-20230117-1255-40910454 > > Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge 'master' > - 8151413: os::allocation_granularity/page_size and friends return signed values > - 8151413: os::allocation_granularity/page_size and friends return signed values > - 8151413: os::allocation_granularity/page_size and friends return signed values > - 8151413: os::allocation_granularity/page_size and friends return signed values > - 8151413: os::allocation_granularity/page_size and friends return signed values Looks good except for a few copyright issues. src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 2: > 1: /* > 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. I'm not sure if we need to add this line for such a small change in the file. src/hotspot/share/runtime/osInfo.hpp line 2: > 1: /* > 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. You need to preserve the original year. It should be `2022, 2023,` test/hotspot/gtest/utilities/test_globalDefinitions.cpp line 2: > 1: /* > 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. Same as above, need to preserve the original year. ------------- Marked as reviewed by ccheung (Reviewer). PR: https://git.openjdk.org/jdk/pull/12091 From duke at openjdk.org Sat Jan 21 15:47:06 2023 From: duke at openjdk.org (Afshin Zafari) Date: Sat, 21 Jan 2023 15:47:06 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values [v6] In-Reply-To: References: Message-ID: On Fri, 20 Jan 2023 17:27:17 GMT, Calvin Cheung wrote: >> Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge 'master' >> - 8151413: os::allocation_granularity/page_size and friends return signed values >> - 8151413: os::allocation_granularity/page_size and friends return signed values >> - 8151413: os::allocation_granularity/page_size and friends return signed values >> - 8151413: os::allocation_granularity/page_size and friends return signed values >> - 8151413: os::allocation_granularity/page_size and friends return signed values > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. > > I'm not sure if we need to add this line for such a small change in the file. This line is result of the merge with master (and it was a conflict). ------------- PR: https://git.openjdk.org/jdk/pull/12091 From duke at openjdk.org Sat Jan 21 16:05:25 2023 From: duke at openjdk.org (Afshin Zafari) Date: Sat, 21 Jan 2023 16:05:25 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values [v7] In-Reply-To: References: Message-ID: > ### Description > os::allocation_granularity/page_size and friends return signed values > > ### Patch > - Type of `vm_page_size` and `vm_allocation_granularity` members of `OSInfo` class and their wrappers in `os` class changed to `size_t` > - Initial value of them changed from -1 to 0. > - In setters, checking for *set only once* condition is updated accordingly (comparing with 0 instead of -1). Also, checking the argument be positive is removed. > - Equal to 0 (instead of `<= 0` ) is used to check if calling setters failed. > - All `(size_t)` casting of getters removed. > - In arithmetic and negation operations, the operand related to the getters casted to `(int)`. Otherwise, the Windows builds complain. > - Explicitly casted to `(int)` where `jint` needed. > - In ` align_up(T size, A alignment)`, assignment of variables of type `A` to type `T` (i.e., `T t = (A) a;`) should be safe. `T : size_t` and `A : int` won't compile. Fixed appropriately. > - `"%d"` format-flags replaced with `SIZE_FORMAT`. > - Type of `CompilerToVM::Data::vm_page_size` changed to `size_t`. > > ### Test > tier1-5: all green, except an unrelated fail for whom a bug is already created. > job-id: afshin-8151413-20230117-1255-40910454 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8151413: os::allocation_granularity/page_size and friends return signed values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12091/files - new: https://git.openjdk.org/jdk/pull/12091/files/2c707069..b91290bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12091&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12091&range=05-06 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12091.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12091/head:pull/12091 PR: https://git.openjdk.org/jdk/pull/12091 From ysr at openjdk.org Sat Jan 21 22:18:29 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 21 Jan 2023 22:18:29 GMT Subject: RFR: JDK-8299703: Improvements in card scanning Message-ID: **Main changes:** 1. `process_clusters()` now finds and processes contiguous ranges of dirty cards, skipping over contiguous ranges of clean cards. 2. the ShenandoahCardCluster class has been extended by a `block_start()` method which returns the first object in a card (which could be co-initial with the card); this method is used by the refactored process_clusters() above. 3. ShenandoahCardCluster class's `has_object()` method has been renamed `starts_object()` which more closely reflects the API. 4. ShenandoahCardStats class has been modified to better suit the way statistics are gathered in the rewritten `process_clusters()`. The larger-grain API should also result in less overhead for gathering the statistics and might (subject to measurement) allow it to be available in product/release builds (if so, that will be done in a separate follow-up ticket). **Testing & Implementation Notes:** 5. Tested with Extremem and SpecJBB, fastdebug, release, and product builds, with and without verification enabled. 6. Preliminary performance data with an Extremem workload show roughly 17-18% reduction in wall-clock durations of concurrent remembered set scanning across the distribution (p0, p25, p50, p75), p100 (max) was marginally down at 2%. The trend of the change was as expected since the gains are lost when we have a higher frequency of dirty/clean alternations with short dirty/clean runs. 7. More performance data with SPECjbb and a different Extremem workload are being gathered, and will be included. We will also include the impact on the concurrent update refs phase, as well as the overall impact on latency scores. **Acknowledgments**: 8. Many thanks to @kdnilsen for feedback on an earlier version of the draft PR, which helped catch a crucial misunderstanding on the role of TAMS and object start marks, and helped fix the error that had been dogging me. **Epilogue**: 9. Further performance improvements are possible, but are deferred for follow-up. ------------- Commit messages: - More const safety, some asserts, some comments. - Change type of loop variable to signed to allow correct termination for the case when start_card_index is 0. Nominal check for overflow when using signed type for card index. - Fix the direction of an address comparison, add a couple of assertions, - Fixes related to tams logic and iteration, block_start backwards walk - TODO: marks the places identified in code walkthrough / review with - ... - Clean up & refine comments; rename a variable or two; add a couple of - Correct tracking of oldest address scanned; relax an overly strong - const some methods. - Merge branch 'master' into rs_scan - ... and 99 more: https://git.openjdk.org/shenandoah/compare/2cab4e76...eaadac7c Changes: https://git.openjdk.org/shenandoah/pull/193/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=193&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8299703 Stats: 836 lines in 11 files changed: 325 ins; 274 del; 237 mod Patch: https://git.openjdk.org/shenandoah/pull/193.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/193/head:pull/193 PR: https://git.openjdk.org/shenandoah/pull/193 From ccheung at openjdk.org Mon Jan 23 05:08:06 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 23 Jan 2023 05:08:06 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values [v7] In-Reply-To: References: Message-ID: On Sat, 21 Jan 2023 16:05:25 GMT, Afshin Zafari wrote: >> ### Description >> os::allocation_granularity/page_size and friends return signed values >> >> ### Patch >> - Type of `vm_page_size` and `vm_allocation_granularity` members of `OSInfo` class and their wrappers in `os` class changed to `size_t` >> - Initial value of them changed from -1 to 0. >> - In setters, checking for *set only once* condition is updated accordingly (comparing with 0 instead of -1). Also, checking the argument be positive is removed. >> - Equal to 0 (instead of `<= 0` ) is used to check if calling setters failed. >> - All `(size_t)` casting of getters removed. >> - In arithmetic and negation operations, the operand related to the getters casted to `(int)`. Otherwise, the Windows builds complain. >> - Explicitly casted to `(int)` where `jint` needed. >> - In ` align_up(T size, A alignment)`, assignment of variables of type `A` to type `T` (i.e., `T t = (A) a;`) should be safe. `T : size_t` and `A : int` won't compile. Fixed appropriately. >> - `"%d"` format-flags replaced with `SIZE_FORMAT`. >> - Type of `CompilerToVM::Data::vm_page_size` changed to `size_t`. >> >> ### Test >> tier1-5: all green, except an unrelated fail for whom a bug is already created. >> job-id: afshin-8151413-20230117-1255-40910454 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8151413: os::allocation_granularity/page_size and friends return signed values src/hotspot/share/runtime/osInfo.cpp line 2: > 1: /* > 2: * Copyright (c) 1997, 2022, 2023, Oracle and/or its affiliates. All rights reserved. You can drop the `2022,` in the above. ------------- PR: https://git.openjdk.org/jdk/pull/12091 From duke at openjdk.org Mon Jan 23 09:22:11 2023 From: duke at openjdk.org (Afshin Zafari) Date: Mon, 23 Jan 2023 09:22:11 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values [v8] In-Reply-To: References: Message-ID: > ### Description > os::allocation_granularity/page_size and friends return signed values > > ### Patch > - Type of `vm_page_size` and `vm_allocation_granularity` members of `OSInfo` class and their wrappers in `os` class changed to `size_t` > - Initial value of them changed from -1 to 0. > - In setters, checking for *set only once* condition is updated accordingly (comparing with 0 instead of -1). Also, checking the argument be positive is removed. > - Equal to 0 (instead of `<= 0` ) is used to check if calling setters failed. > - All `(size_t)` casting of getters removed. > - In arithmetic and negation operations, the operand related to the getters casted to `(int)`. Otherwise, the Windows builds complain. > - Explicitly casted to `(int)` where `jint` needed. > - In ` align_up(T size, A alignment)`, assignment of variables of type `A` to type `T` (i.e., `T t = (A) a;`) should be safe. `T : size_t` and `A : int` won't compile. Fixed appropriately. > - `"%d"` format-flags replaced with `SIZE_FORMAT`. > - Type of `CompilerToVM::Data::vm_page_size` changed to `size_t`. > > ### Test > tier1-5: all green, except an unrelated fail for whom a bug is already created. > job-id: afshin-8151413-20230117-1255-40910454 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8151413: os::allocation_granularity/page_size and friends return signed values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12091/files - new: https://git.openjdk.org/jdk/pull/12091/files/b91290bc..2f58d7e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12091&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12091&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12091.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12091/head:pull/12091 PR: https://git.openjdk.org/jdk/pull/12091 From smonteith at openjdk.org Mon Jan 23 11:24:13 2023 From: smonteith at openjdk.org (Stuart Monteith) Date: Mon, 23 Jan 2023 11:24:13 GMT Subject: RFR: 8298647: GenShen require heap size 2MB granularity [v3] In-Reply-To: References: Message-ID: > Generational Shenandoah requires 2MB granularity in order for card tables to cover the allocated heap. Each byte in a page of card table represents 512 heap bytes. As card tables are allocated 4KB at a time, 4KB * 512 = 2MB. > > There is a circular dependency between the region calculations and the heap size calculations. This unconditionally rounds up the heap size to 2MB. It might be preferable to do this only when generational mode is enabled. > > Running with: > java -Xlog:gc*=trace -XX:+UseShenandoahGC -mx495m \ > -XX:ShenandoahGCMode=generational -version > > on a debug build is sufficient to reproduce this problem. Stuart Monteith has updated the pull request incrementally with one additional commit since the last revision: Check shenandoah generational is enabled. Only round up for the card table alignment when generational mode is enabled. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/202/files - new: https://git.openjdk.org/shenandoah/pull/202/files/96d4347d..cd1fea84 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=202&range=02 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=202&range=01-02 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/202.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/202/head:pull/202 PR: https://git.openjdk.org/shenandoah/pull/202 From smonteith at openjdk.org Mon Jan 23 15:09:55 2023 From: smonteith at openjdk.org (Stuart Monteith) Date: Mon, 23 Jan 2023 15:09:55 GMT Subject: RFR: 8298647: GenShen require heap size 2MB granularity [v3] In-Reply-To: References: Message-ID: <3zVu6ILyT79nzk76w1_m-iTnTz6xwWSssGRTiK1nR9w=.60da80e7-22c2-4506-93d8-9521391dd9d7@github.com> On Mon, 23 Jan 2023 11:24:13 GMT, Stuart Monteith wrote: >> Generational Shenandoah requires 2MB granularity in order for card tables to cover the allocated heap. Each byte in a page of card table represents 512 heap bytes. As card tables are allocated 4KB at a time, 4KB * 512 = 2MB. >> >> There is a circular dependency between the region calculations and the heap size calculations. This unconditionally rounds up the heap size to 2MB. It might be preferable to do this only when generational mode is enabled. >> >> Running with: >> java -Xlog:gc*=trace -XX:+UseShenandoahGC -mx495m \ >> -XX:ShenandoahGCMode=generational -version >> >> on a debug build is sufficient to reproduce this problem. > > Stuart Monteith has updated the pull request incrementally with one additional commit since the last revision: > > Check shenandoah generational is enabled. > > Only round up for the card table alignment when generational mode is > enabled. I included with the patch the statistics for the new patterns, here they are: Adds new patterns to match a memory address and a constant, which are emitted as two loads. C2 expects that constants INTs will always be manifested by immediate loads. However, as these intrinsics are avoiding loading into GPRs and moving them into the vector registers, there is an addition to allow integers to be loaded as a constant. There isn't a pattern to match two LoadI nodes - C2 doesn't handle nodes with two memory nodes for calculating anti-dependencies. Benchmark Result Units % against non-SVE2 Integers.compress 2.009 ?s/op Integers.compress-SVE 1.435 ?s/op 71.43% Integers.compress-SVE+mem 1.263 ?s/op 62.87% Integers.expand 2.129 ?s/op Integers.expand-SVE 1.433 ?s/op 67.31% Integers.expand-SVE+mem 1.32 ?s/op 62.00% Longs.compress 2.504 ?s/op Longs.compress-SVE 1.445 ?s/op 57.71% Longs.compress-SVE+mem 1.269 ?s/op 50.68% Longs.expand 2.614 ?s/op Longs.expand-SVE 1.489 ?s/op 56.96% Longs.expand-SVE+mem 1.272 ?s/op 48.66% ------------- PR: https://git.openjdk.org/shenandoah/pull/202 From ccheung at openjdk.org Mon Jan 23 17:14:46 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 23 Jan 2023 17:14:46 GMT Subject: RFR: 8151413: os::allocation_granularity/page_size and friends return signed values [v8] In-Reply-To: References: Message-ID: On Mon, 23 Jan 2023 09:22:11 GMT, Afshin Zafari wrote: >> ### Description >> os::allocation_granularity/page_size and friends return signed values >> >> ### Patch >> - Type of `vm_page_size` and `vm_allocation_granularity` members of `OSInfo` class and their wrappers in `os` class changed to `size_t` >> - Initial value of them changed from -1 to 0. >> - In setters, checking for *set only once* condition is updated accordingly (comparing with 0 instead of -1). Also, checking the argument be positive is removed. >> - Equal to 0 (instead of `<= 0` ) is used to check if calling setters failed. >> - All `(size_t)` casting of getters removed. >> - In arithmetic and negation operations, the operand related to the getters casted to `(int)`. Otherwise, the Windows builds complain. >> - Explicitly casted to `(int)` where `jint` needed. >> - In ` align_up(T size, A alignment)`, assignment of variables of type `A` to type `T` (i.e., `T t = (A) a;`) should be safe. `T : size_t` and `A : int` won't compile. Fixed appropriately. >> - `"%d"` format-flags replaced with `SIZE_FORMAT`. >> - Type of `CompilerToVM::Data::vm_page_size` changed to `size_t`. >> >> ### Test >> tier1-5: all green, except an unrelated fail for whom a bug is already created. >> job-id: afshin-8151413-20230117-1255-40910454 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8151413: os::allocation_granularity/page_size and friends return signed values Marked as reviewed by ccheung (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12091 From wkemper at openjdk.org Mon Jan 23 19:01:19 2023 From: wkemper at openjdk.org (William Kemper) Date: Mon, 23 Jan 2023 19:01:19 GMT Subject: RFR: 8298647: GenShen require heap size 2MB granularity [v3] In-Reply-To: References: Message-ID: On Mon, 23 Jan 2023 11:24:13 GMT, Stuart Monteith wrote: >> Generational Shenandoah requires 2MB granularity in order for card tables to cover the allocated heap. Each byte in a page of card table represents 512 heap bytes. As card tables are allocated 4KB at a time, 4KB * 512 = 2MB. >> >> There is a circular dependency between the region calculations and the heap size calculations. This unconditionally rounds up the heap size to 2MB. It might be preferable to do this only when generational mode is enabled. >> >> Running with: >> java -Xlog:gc*=trace -XX:+UseShenandoahGC -mx495m \ >> -XX:ShenandoahGCMode=generational -version >> >> on a debug build is sufficient to reproduce this problem. > > Stuart Monteith has updated the pull request incrementally with one additional commit since the last revision: > > Check shenandoah generational is enabled. > > Only round up for the card table alignment when generational mode is > enabled. Thank you for this contribution! ------------- Marked as reviewed by wkemper (Committer). PR: https://git.openjdk.org/shenandoah/pull/202 From wkemper at openjdk.org Mon Jan 23 19:54:41 2023 From: wkemper at openjdk.org (William Kemper) Date: Mon, 23 Jan 2023 19:54:41 GMT Subject: RFR: 8300644: Remove gc/shenandoah/jni/TestStringCriticalWithDedup.java In-Reply-To: References: Message-ID: On Thu, 19 Jan 2023 10:26:38 GMT, Erik ?sterlund wrote: > The gc/shenandoah/jni/TestStringCriticalWithDedup.java test was designed to catch failure to pin strings being passed out to JNI critical users, because that used to be dangerous. After [JDK-8299673](https://bugs.openjdk.org/browse/JDK-8299673) that is not dangerous any longer. Conversely, now we kind of want deduplication to proceed regardless of JNI critical, which defeats the purpose of this test. It should be removed. Looks like it will also be safe to delete `libTestStringCriticalWithDedup.c`. ------------- Changes requested by wkemper (no project role). PR: https://git.openjdk.org/jdk/pull/12089 From smonteith at openjdk.org Tue Jan 24 00:42:41 2023 From: smonteith at openjdk.org (Stuart Monteith) Date: Tue, 24 Jan 2023 00:42:41 GMT Subject: Integrated: 8298647: GenShen require heap size 2MB granularity In-Reply-To: References: Message-ID: On Mon, 16 Jan 2023 21:41:35 GMT, Stuart Monteith wrote: > Generational Shenandoah requires 2MB granularity in order for card tables to cover the allocated heap. Each byte in a page of card table represents 512 heap bytes. As card tables are allocated 4KB at a time, 4KB * 512 = 2MB. > > There is a circular dependency between the region calculations and the heap size calculations. This unconditionally rounds up the heap size to 2MB. It might be preferable to do this only when generational mode is enabled. > > Running with: > java -Xlog:gc*=trace -XX:+UseShenandoahGC -mx495m \ > -XX:ShenandoahGCMode=generational -version > > on a debug build is sufficient to reproduce this problem. This pull request has now been integrated. Changeset: 800c0c88 Author: Stuart Monteith Committer: William Kemper URL: https://git.openjdk.org/shenandoah/commit/800c0c884631ae5b9c265c43dad8a49a17a8ec09 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod 8298647: GenShen require heap size 2MB granularity Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/shenandoah/pull/202 From mbaesken at openjdk.org Tue Jan 24 09:06:03 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 24 Jan 2023 09:06:03 GMT Subject: RFR: 8300644: Remove gc/shenandoah/jni/TestStringCriticalWithDedup.java In-Reply-To: References: Message-ID: <2Y1hSZGuJgRt3or6OHu3yADEG0HcJq4N61EFqXgzwdE=.e75d796f-9d41-4083-a89a-c43b4bb17308@github.com> On Thu, 19 Jan 2023 10:26:38 GMT, Erik ?sterlund wrote: > The gc/shenandoah/jni/TestStringCriticalWithDedup.java test was designed to catch failure to pin strings being passed out to JNI critical users, because that used to be dangerous. After [JDK-8299673](https://bugs.openjdk.org/browse/JDK-8299673) that is not dangerous any longer. Conversely, now we kind of want deduplication to proceed regardless of JNI critical, which defeats the purpose of this test. It should be removed. LGTM ------------- Marked as reviewed by mbaesken (Reviewer). PR: https://git.openjdk.org/jdk/pull/12089 From eosterlund at openjdk.org Tue Jan 24 09:35:12 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 24 Jan 2023 09:35:12 GMT Subject: RFR: 8300644: Remove gc/shenandoah/jni/TestStringCriticalWithDedup.java [v2] In-Reply-To: References: Message-ID: > The gc/shenandoah/jni/TestStringCriticalWithDedup.java test was designed to catch failure to pin strings being passed out to JNI critical users, because that used to be dangerous. After [JDK-8299673](https://bugs.openjdk.org/browse/JDK-8299673) that is not dangerous any longer. Conversely, now we kind of want deduplication to proceed regardless of JNI critical, which defeats the purpose of this test. It should be removed. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Remove unused file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12089/files - new: https://git.openjdk.org/jdk/pull/12089/files/ca12e310..ef92536b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12089&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12089&range=00-01 Stats: 39 lines in 1 file changed: 0 ins; 39 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12089.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12089/head:pull/12089 PR: https://git.openjdk.org/jdk/pull/12089 From eosterlund at openjdk.org Tue Jan 24 09:35:12 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 24 Jan 2023 09:35:12 GMT Subject: RFR: 8300644: Remove gc/shenandoah/jni/TestStringCriticalWithDedup.java [v2] In-Reply-To: References: Message-ID: <2j-WkV5oyg6e3xiryrfAHRhhTPJw89fye4CE8yLnN54=.46bbb581-2f10-486f-b2de-7c7744125cbe@github.com> On Mon, 23 Jan 2023 19:52:18 GMT, William Kemper wrote: > Looks like it will also be safe to delete `libTestStringCriticalWithDedup.c`. Ah yeah, look at that. Fixed. ------------- PR: https://git.openjdk.org/jdk/pull/12089 From eosterlund at openjdk.org Tue Jan 24 09:35:14 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 24 Jan 2023 09:35:14 GMT Subject: RFR: 8300644: Remove gc/shenandoah/jni/TestStringCriticalWithDedup.java [v2] In-Reply-To: <2Y1hSZGuJgRt3or6OHu3yADEG0HcJq4N61EFqXgzwdE=.e75d796f-9d41-4083-a89a-c43b4bb17308@github.com> References: <2Y1hSZGuJgRt3or6OHu3yADEG0HcJq4N61EFqXgzwdE=.e75d796f-9d41-4083-a89a-c43b4bb17308@github.com> Message-ID: On Tue, 24 Jan 2023 09:03:35 GMT, Matthias Baesken wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused file > > LGTM Thanks for the reviews, @MBaesken and @earthling-amzn! ------------- PR: https://git.openjdk.org/jdk/pull/12089 From wkemper at openjdk.org Tue Jan 24 16:59:08 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 24 Jan 2023 16:59:08 GMT Subject: RFR: 8300644: Remove gc/shenandoah/jni/TestStringCriticalWithDedup.java [v2] In-Reply-To: References: Message-ID: On Tue, 24 Jan 2023 09:35:12 GMT, Erik ?sterlund wrote: >> The gc/shenandoah/jni/TestStringCriticalWithDedup.java test was designed to catch failure to pin strings being passed out to JNI critical users, because that used to be dangerous. After [JDK-8299673](https://bugs.openjdk.org/browse/JDK-8299673) that is not dangerous any longer. Conversely, now we kind of want deduplication to proceed regardless of JNI critical, which defeats the purpose of this test. It should be removed. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused file Marked as reviewed by wkemper (no project role). ------------- PR: https://git.openjdk.org/jdk/pull/12089 From eosterlund at openjdk.org Wed Jan 25 08:19:12 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 25 Jan 2023 08:19:12 GMT Subject: Integrated: 8300644: Remove gc/shenandoah/jni/TestStringCriticalWithDedup.java In-Reply-To: References: Message-ID: On Thu, 19 Jan 2023 10:26:38 GMT, Erik ?sterlund wrote: > The gc/shenandoah/jni/TestStringCriticalWithDedup.java test was designed to catch failure to pin strings being passed out to JNI critical users, because that used to be dangerous. After [JDK-8299673](https://bugs.openjdk.org/browse/JDK-8299673) that is not dangerous any longer. Conversely, now we kind of want deduplication to proceed regardless of JNI critical, which defeats the purpose of this test. It should be removed. This pull request has now been integrated. Changeset: 95fafd09 Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/95fafd094f93eaf3ff15c76ca25345123d1586fe Stats: 206 lines in 2 files changed: 0 ins; 206 del; 0 mod 8300644: Remove gc/shenandoah/jni/TestStringCriticalWithDedup.java Reviewed-by: wkemper, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/12089 From tschatzl at openjdk.org Wed Jan 25 15:34:39 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 25 Jan 2023 15:34:39 GMT Subject: RFR: 8221785: Let possibly_parallel_threads_do cover the same threads as threads_do Message-ID: Hi all, can I have reviews for this change that makes `Threads::possibly_parallel_threads_do` iterate over the same set of threads as `threads_do` to have parity? I.e. over all java and non-java threads. Originally this CR has been created to make a new method that keeps iterating only over java threads and the VM thread, but it's a bit weird to have both variants as the overhead of the extra threads is negligible and otherwise just confusing. So I made `Threads::possibly_parallel_threads_do` iterate over all threads; all uses support that afaict, also all uses correctly change the claim token (mostly in the enclosing `StrongRootsScope`). This allows some minimally better hiding of the token mechanism. One other reason for not doing this in the first place (as in [JDK-8221102](https://bugs.openjdk.org/browse/JDK-8221102), or as discussed [here](https://mail.openjdk.org/pipermail/hotspot-dev/2019-April/037541.html)) has been the fear that there would be a problem with threads being created during iteration and the (common) call to 'Threads::assert_all_threads_claimed`. However all calls so far are during a safepoint, and none seem to create new threads. I suggest to defer looking at this problem when it is important. Moreover I need that functionality is required for (JDK-8211104)[https://bugs.openjdk.org/browse/JDK-8211104]. :) Testing: tier1-4, gha Thanks, Thomas ------------- Commit messages: - Replace existing manual claiming with possibly_parallel_threads_do calls - initial version, just making possibly_parallel_threads_do cover the same threads as threads_do Changes: https://git.openjdk.org/jdk/pull/12201/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12201&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8221785 Stats: 55 lines in 7 files changed: 16 ins; 8 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/12201.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12201/head:pull/12201 PR: https://git.openjdk.org/jdk/pull/12201 From wkemper at openjdk.org Wed Jan 25 19:24:08 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 25 Jan 2023 19:24:08 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: Merges upstream tag jdk-21+6. ------------- Commit messages: - Merge openjdk/jdk:master - 8292635: Run ArchivedEnumTest.java in jdk tier testing - Merge - 8300275: SegmentScope.isAccessibleBy returning incorrect values - 8300195: Fall-through issue occurs when using record pattern in switch statements - 8295723: security/infra/wycheproof/RunWycheproof.java fails with Assertion Error - 8295687: Better BMP bounds - 8293742: Better Banking of Sounds - 8293554: Enhanced DH Key Exchanges - 8287411: Enhance DTLS Performance - ... and 117 more: https://git.openjdk.org/shenandoah/compare/800c0c88...5857315e The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=shenandoah&pr=205&range=00.0 - openjdk/jdk:master: https://webrevs.openjdk.org/?repo=shenandoah&pr=205&range=00.1 Changes: https://git.openjdk.org/shenandoah/pull/205/files Stats: 10920 lines in 794 files changed: 5817 ins; 1573 del; 3530 mod Patch: https://git.openjdk.org/shenandoah/pull/205.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/205/head:pull/205 PR: https://git.openjdk.org/shenandoah/pull/205 From wkemper at openjdk.org Thu Jan 26 00:12:09 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 26 Jan 2023 00:12:09 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: <2D6cB3JJcTGWh7TsKcDqjUwX-bZ1GpYEo9iHAoD9PyM=.dc8c0c8d-b3c1-4bfc-9ff9-158ae4a8f43b@github.com> On Wed, 25 Jan 2023 19:17:00 GMT, William Kemper wrote: > Merges upstream tag jdk-21+6. This pull request has now been integrated. Changeset: 1974f58a Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/1974f58af6f9c1704fa9059acdc568792ef767d6 Stats: 10920 lines in 794 files changed: 5817 ins; 1573 del; 3530 mod Merge openjdk/jdk:master ------------- PR: https://git.openjdk.org/shenandoah/pull/205 From ysr at openjdk.org Thu Jan 26 02:04:06 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 26 Jan 2023 02:04:06 GMT Subject: RFR: 8299703: GenShen: improvements in card scanning In-Reply-To: References: Message-ID: On Thu, 5 Jan 2023 20:45:00 GMT, Y. Srinivas Ramakrishna wrote: > **Main changes:** > 1. `process_clusters()` now finds and processes contiguous ranges of dirty cards, skipping over contiguous ranges of clean cards. For reading the diffs, it might be easiest to look at the new code, rather than view the side-by-side diffs. > 2. the ShenandoahCardCluster class has been extended by a `block_start()` method which returns the first object in a card (which could be co-initial with the card); this method is used by the refactored `process_clusters()` above. > 3. ShenandoahCardCluster class's `has_object()` method has been renamed `starts_object()` which more closely reflects the API. > 4. ShenandoahCardStats class has been modified to better suit the way statistics are gathered in the rewritten `process_clusters()`. The larger-grain API should also result in less overhead for gathering the statistics and might (subject to measurement) allow it to be available in product/release builds (if so, that will be done in a separate follow-up ticket). > 5. Added some const annotations. > > **Testing & Implementation Notes:** > 6. Tested with Extremem and SpecJBB, fastdebug, release, and product builds, with and without verification enabled. > 7. Preliminary performance data with an Extremem workload show roughly 17-18% reduction in wall-clock durations of concurrent remembered set scanning across the distribution (p0, p25, p50, p75), p100 (max) was marginally down at 2%. The trend of the change was as expected since the gains are lost when we have a higher frequency of dirty/clean alternations with short dirty/clean runs. > 8. More performance data with SPECjbb and a different Extremem workload are being gathered, and will be included. We will also include the impact on the concurrent update refs phase, as well as the overall impact on latency scores. > > **Acknowledgments**: > 9. Many thanks to @kdnilsen for feedback on an earlier version of the draft PR, which helped catch a crucial misunderstanding on the role of TAMS and marked objects, and helped fix the error that had been dogging me. > > **Epilogue**: > 10. Further performance improvements are possible, but are deferred for follow-up. 1/25 Update: During testing (with hyperalloc) an intermittent crash was observed which indicated a cross-generational reference pointing into the young collection set that was not evacuated (because it was not found by the marking). This pointer was found when updating cross generational references post-young-evacuation. This could occur for a variety of reasons, including an error in the card-scanning code modified in this PR. I am still in the midst of chasing down this crash (it's difficult to reproduce). However, please continue with your code reviews. I'll attach the promised summaries of performance deltas w/SPECjbb & three different Extremem workloads in my next comment. ------------- PR: https://git.openjdk.org/shenandoah/pull/193 From iwalulya at openjdk.org Thu Jan 26 11:07:36 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 26 Jan 2023 11:07:36 GMT Subject: RFR: 8221785: Let possibly_parallel_threads_do cover the same threads as threads_do In-Reply-To: References: Message-ID: On Wed, 25 Jan 2023 15:26:50 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that makes `Threads::possibly_parallel_threads_do` iterate over the same set of threads as `threads_do` to have parity? I.e. over all java and non-java threads. > > Originally this CR has been created to make a new method that keeps iterating only over java threads and the VM thread, but it's a bit weird to have both variants as the overhead of the extra threads is negligible and otherwise just confusing. > > So I made `Threads::possibly_parallel_threads_do` iterate over all threads; all uses support that afaict, also all uses correctly change the claim token (mostly in the enclosing `StrongRootsScope`). > > This allows some minimally better hiding of the token mechanism. > > One other reason for not doing this in the first place (as in [JDK-8221102](https://bugs.openjdk.org/browse/JDK-8221102), or as discussed [here](https://mail.openjdk.org/pipermail/hotspot-dev/2019-April/037541.html)) has been the fear that there would be a problem with threads being created during iteration and the (common) call to 'Threads::assert_all_threads_claimed`. However all calls so far are during a safepoint, and none seem to create new threads. I suggest to defer looking at this problem when it is important. > > Moreover I need that functionality is required for (JDK-8211104)[https://bugs.openjdk.org/browse/JDK-8211104]. :) > > Testing: tier1-4, gha > > Thanks, > Thomas Lgtm! ------------- Marked as reviewed by iwalulya (Reviewer). PR: https://git.openjdk.org/jdk/pull/12201 From coleenp at openjdk.org Thu Jan 26 14:05:19 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 26 Jan 2023 14:05:19 GMT Subject: RFR: 8221785: Let possibly_parallel_threads_do cover the same threads as threads_do In-Reply-To: References: Message-ID: On Wed, 25 Jan 2023 15:26:50 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that makes `Threads::possibly_parallel_threads_do` iterate over the same set of threads as `threads_do` to have parity? I.e. over all java and non-java threads. > > Originally this CR has been created to make a new method that keeps iterating only over java threads and the VM thread, but it's a bit weird to have both variants as the overhead of the extra threads is negligible and otherwise just confusing. > > So I made `Threads::possibly_parallel_threads_do` iterate over all threads; all uses support that afaict, also all uses correctly change the claim token (mostly in the enclosing `StrongRootsScope`). > > This allows some minimally better hiding of the token mechanism. > > One other reason for not doing this in the first place (as in [JDK-8221102](https://bugs.openjdk.org/browse/JDK-8221102), or as discussed [here](https://mail.openjdk.org/pipermail/hotspot-dev/2019-April/037541.html)) has been the fear that there would be a problem with threads being created during iteration and the (common) call to 'Threads::assert_all_threads_claimed`. However all calls so far are during a safepoint, and none seem to create new threads. I suggest to defer looking at this problem when it is important. > > Moreover I need that functionality is required for (JDK-8211104)[https://bugs.openjdk.org/browse/JDK-8211104]. :) > > Testing: tier1-4, gha > > Thanks, > Thomas Thank you for picking this up. src/hotspot/share/runtime/threads.cpp line 267: > 265: if (current->claim_threads_do(is_par, claim_token)) { > 266: tc->do_thread(current); > 267: } If I understand correctly, if this is only OK in a safepoint, can you add the safepoint assert here? ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12201 From ysr at openjdk.org Thu Jan 26 15:37:12 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 26 Jan 2023 15:37:12 GMT Subject: RFR: 8299703: GenShen: improvements in card scanning In-Reply-To: References: Message-ID: On Thu, 5 Jan 2023 20:45:00 GMT, Y. Srinivas Ramakrishna wrote: > **Main changes:** > 1. `process_clusters()` now finds and processes contiguous ranges of dirty cards, skipping over contiguous ranges of clean cards. For reading the diffs, it might be easiest to look at the new code, rather than view the side-by-side diffs. > 2. the ShenandoahCardCluster class has been extended by a `block_start()` method which returns the first object in a card (which could be co-initial with the card); this method is used by the refactored `process_clusters()` above. > 3. ShenandoahCardCluster class's `has_object()` method has been renamed `starts_object()` which more closely reflects the API. > 4. ShenandoahCardStats class has been modified to better suit the way statistics are gathered in the rewritten `process_clusters()`. The larger-grain API should also result in less overhead for gathering the statistics and might (subject to measurement) allow it to be available in product/release builds (if so, that will be done in a separate follow-up ticket). > 5. Added some const annotations. > > **Testing & Implementation Notes:** > 6. Tested with Extremem and SpecJBB, fastdebug, release, and product builds, with and without verification enabled. > 7. Preliminary performance data with an Extremem workload show roughly 17-18% reduction in wall-clock durations of concurrent remembered set scanning across the distribution (p0, p25, p50, p75), p100 (max) was marginally down at 2%. The trend of the change was as expected since the gains are lost when we have a higher frequency of dirty/clean alternations with short dirty/clean runs. > 8. More performance data with SPECjbb and a different Extremem workload are being gathered, and will be included. We will also include the impact on the concurrent update refs phase, as well as the overall impact on latency scores. > > **Acknowledgments**: > 9. Many thanks to @kdnilsen for feedback on an earlier version of the draft PR, which helped catch a crucial misunderstanding on the role of TAMS and marked objects, and helped fix the error that had been dogging me. > > **Epilogue**: > 10. Further performance improvements are possible, but are deferred for follow-up. **SPECjbb** Conc Scan Rem: ----------------- Before: av=72.3 lvls=(0.83, 57.03, 68.24, 83.38, 848.8). ms After: av=53.9 lvls=(0.66, 42.62, 53.48, 66.09, 253.8) ms ---------------------------------------------------------------- Delta: -25 -20. -25 -21. -21 -70. % Conc Upd Ref: -------------- Before: av=278.7 lvls=(4.91, 113.63, 197.46, 457.42, 892.35). ms After: av=145.3 lvls=(2.21, 66.52, 111.62, 185.35, 881.24) ms ------------------------------------------------------------------------- Delta: -48. -55. -41. -43. -59. -1.2. % **Extremem Config1** Conc Scan Rem: ----------------- Before: av=8.7. lvls=(0.94, 8.18, 8.63, 9.05, 22.99). ms After: av=7.2. lvls=(0.78, 6.71, 7.10, 7.49, 22.61) ms ----------------------------------------------------------------------- Delta: -16 -17. -18 -18 -17. -1.7 % Conc Upd Ref: -------------- Before: av=12.49 lvls=(4.95, 12.11, 12.59, 13.22, 18.28) ms After: av=11.00. lvls=(3.13, 10.53, 11.02, 11.50, 18.24) ms ----------------------------------------------------------------------- Delta: -12 -37 -13 -13 -13 -0.2 % **Extremem Config2** Conc Scan Rem: ----------------- Before: av=123 lvls=(3.53, 22.45, 49.79, 172.95, 779.97) ms After: av=129 lvls=(3.59, 17.68, 41.60, 218.83, 774.75). ms -------------------------------------------------------------------- Delta: +5 0. -27. -20. +21. -7 % Conc Upd Ref: -------------- Before: av=257. lvls=(21.6, 144.1, 241.9, 318.9, 762.8) ms After: av=257 lvls=(26.8, 105.5, 244.7, 330.5, 751.3) ms -------------------------------------------------------------------- Delta: 0 +24 -27 +1 +4 -2 % The following Extremem configurations are not useful because the generation sizes were subject to rapid changes, and many degenerate collections occurred at random times: **Extremem Config3** Conc Scan Rem: ----------------- Before: av=9.45 lvls=(0.003, 0.369, 0.742, 19.79, 24.59) s After: av=0.37 lvls=(0.004, 0.151, 0.272, 0.417, 6.07). s --------------------------------------------------------------------- Delta: -96 +10 -59. -63 -98 -75 % Conc Upd Ref: -------------- Before: av=235.7 lvls=(45.0, 62.5, 265.0, 301.4, 679.3) ms After: av=274.5 lvls=(19.2, 153.9, 242.2,370.9, 1213.4) ms ------------------------------------------------------------------- Delta: +17 -57. +147. -9. +23. +79 % ------------- PR: https://git.openjdk.org/shenandoah/pull/193 From tschatzl at openjdk.org Thu Jan 26 15:43:51 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 26 Jan 2023 15:43:51 GMT Subject: RFR: 8221785: Let possibly_parallel_threads_do cover the same threads as threads_do [v2] In-Reply-To: References: Message-ID: On Thu, 26 Jan 2023 11:05:07 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> coleenp review > > Lgtm! Thanks @walulyai @coleenp for your reviews ------------- PR: https://git.openjdk.org/jdk/pull/12201 From tschatzl at openjdk.org Thu Jan 26 15:43:53 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 26 Jan 2023 15:43:53 GMT Subject: RFR: 8221785: Let possibly_parallel_threads_do cover the same threads as threads_do [v2] In-Reply-To: References: Message-ID: On Thu, 26 Jan 2023 14:02:20 GMT, Coleen Phillimore wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> coleenp review > > src/hotspot/share/runtime/threads.cpp line 267: > >> 265: if (current->claim_threads_do(is_par, claim_token)) { >> 266: tc->do_thread(current); >> 267: } > > If I understand correctly, if this is only OK in a safepoint, can you add the safepoint assert here? Done. Another tier1-3 is almost done, also did local testing of Shenandoah tests to see whether there is any issue, none found. ------------- PR: https://git.openjdk.org/jdk/pull/12201 From tschatzl at openjdk.org Thu Jan 26 15:43:54 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 26 Jan 2023 15:43:54 GMT Subject: Integrated: 8221785: Let possibly_parallel_threads_do cover the same threads as threads_do In-Reply-To: References: Message-ID: On Wed, 25 Jan 2023 15:26:50 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that makes `Threads::possibly_parallel_threads_do` iterate over the same set of threads as `threads_do` to have parity? I.e. over all java and non-java threads. > > Originally this CR has been created to make a new method that keeps iterating only over java threads and the VM thread, but it's a bit weird to have both variants as the overhead of the extra threads is negligible and otherwise just confusing. > > So I made `Threads::possibly_parallel_threads_do` iterate over all threads; all uses support that afaict, also all uses correctly change the claim token (mostly in the enclosing `StrongRootsScope`). > > This allows some minimally better hiding of the token mechanism. > > One other reason for not doing this in the first place (as in [JDK-8221102](https://bugs.openjdk.org/browse/JDK-8221102), or as discussed [here](https://mail.openjdk.org/pipermail/hotspot-dev/2019-April/037541.html)) has been the fear that there would be a problem with threads being created during iteration and the (common) call to 'Threads::assert_all_threads_claimed`. However all calls so far are during a safepoint, and none seem to create new threads. I suggest to defer looking at this problem when it is important. > > Moreover I need that functionality is required for (JDK-8211104)[https://bugs.openjdk.org/browse/JDK-8211104]. :) > > Testing: tier1-4, gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: 315398c2 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/315398c2450e47d9cdb7fac944e35ba6a6aef221 Stats: 57 lines in 7 files changed: 18 ins; 8 del; 31 mod 8221785: Let possibly_parallel_threads_do cover the same threads as threads_do Reviewed-by: iwalulya, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/12201 From tschatzl at openjdk.org Thu Jan 26 15:43:49 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 26 Jan 2023 15:43:49 GMT Subject: RFR: 8221785: Let possibly_parallel_threads_do cover the same threads as threads_do [v2] In-Reply-To: References: Message-ID: <2s6ZNHboTwZGUYl11kjjq14SZb0VIIRpiVXK17g_7tY=.c12f9676-60e5-4379-87f4-2de17d09f4ad@github.com> > Hi all, > > can I have reviews for this change that makes `Threads::possibly_parallel_threads_do` iterate over the same set of threads as `threads_do` to have parity? I.e. over all java and non-java threads. > > Originally this CR has been created to make a new method that keeps iterating only over java threads and the VM thread, but it's a bit weird to have both variants as the overhead of the extra threads is negligible and otherwise just confusing. > > So I made `Threads::possibly_parallel_threads_do` iterate over all threads; all uses support that afaict, also all uses correctly change the claim token (mostly in the enclosing `StrongRootsScope`). > > This allows some minimally better hiding of the token mechanism. > > One other reason for not doing this in the first place (as in [JDK-8221102](https://bugs.openjdk.org/browse/JDK-8221102), or as discussed [here](https://mail.openjdk.org/pipermail/hotspot-dev/2019-April/037541.html)) has been the fear that there would be a problem with threads being created during iteration and the (common) call to 'Threads::assert_all_threads_claimed`. However all calls so far are during a safepoint, and none seem to create new threads. I suggest to defer looking at this problem when it is important. > > Moreover I need that functionality is required for (JDK-8211104)[https://bugs.openjdk.org/browse/JDK-8211104]. :) > > Testing: tier1-4, gha > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: coleenp review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12201/files - new: https://git.openjdk.org/jdk/pull/12201/files/34e67f56..1d4458ff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12201&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12201&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12201.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12201/head:pull/12201 PR: https://git.openjdk.org/jdk/pull/12201 From ysr at openjdk.org Thu Jan 26 22:39:49 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 26 Jan 2023 22:39:49 GMT Subject: RFR: 8299703: GenShen: improvements in card scanning [v2] In-Reply-To: References: Message-ID: > **Main changes:** > 1. `process_clusters()` now finds and processes contiguous ranges of dirty cards, skipping over contiguous ranges of clean cards. For reading the diffs, it might be easiest to look at the new code, rather than view the side-by-side diffs. > 2. the ShenandoahCardCluster class has been extended by a `block_start()` method which returns the first object in a card (which could be co-initial with the card); this method is used by the refactored `process_clusters()` above. > 3. ShenandoahCardCluster class's `has_object()` method has been renamed `starts_object()` which more closely reflects the API. > 4. ShenandoahCardStats class has been modified to better suit the way statistics are gathered in the rewritten `process_clusters()`. The larger-grain API should also result in less overhead for gathering the statistics and might (subject to measurement) allow it to be available in product/release builds (if so, that will be done in a separate follow-up ticket). > 5. Added some const annotations. > > **Testing & Implementation Notes:** > 6. Tested with Extremem and SpecJBB, fastdebug, release, and product builds, with and without verification enabled. > 7. Preliminary performance data with an Extremem workload show roughly 17-18% reduction in wall-clock durations of concurrent remembered set scanning across the distribution (p0, p25, p50, p75), p100 (max) was marginally down at 2%. The trend of the change was as expected since the gains are lost when we have a higher frequency of dirty/clean alternations with short dirty/clean runs. > 8. More performance data with SPECjbb and a different Extremem workload are being gathered, and will be included. We will also include the impact on the concurrent update refs phase, as well as the overall impact on latency scores. > > **Acknowledgments**: > 9. Many thanks to @kdnilsen for feedback on an earlier version of the draft PR, which helped catch a crucial misunderstanding on the role of TAMS and marked objects, and helped fix the error that had been dogging me. > > **Epilogue**: > 10. Further performance improvements are possible, but are deferred for follow-up. Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 110 commits: - Merge branch 'master' into rs_scan - More const safety, some asserts, some comments. - Change type of loop variable to signed to allow correct termination for the case when start_card_index is 0. Nominal check for overflow when using signed type for card index. - Fix the direction of an address comparison, add a couple of assertions, and elaborate some comments. Passes heap verification handily now. - Fixes related to tams logic and iteration, block_start backwards walk loop, etc. from review feedback from @kdnilsen. More const safety, & elaboration of some comments. - TODO: marks the places identified in code walkthrough / review with @kdnilsen that need fixing up. These will be addressed in the next commit. - ... - Clean up & refine comments; rename a variable or two; add a couple of assertion checks. - Correct tracking of oldest address scanned; relax an overly strong assertion. - const some methods. - ... and 100 more: https://git.openjdk.org/shenandoah/compare/1974f58a...79b5f733 ------------- Changes: https://git.openjdk.org/shenandoah/pull/193/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=193&range=01 Stats: 836 lines in 11 files changed: 325 ins; 274 del; 237 mod Patch: https://git.openjdk.org/shenandoah/pull/193.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/193/head:pull/193 PR: https://git.openjdk.org/shenandoah/pull/193 From ysr at openjdk.org Fri Jan 27 01:51:24 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 27 Jan 2023 01:51:24 GMT Subject: RFR: 8299703: GenShen: improvements in card scanning [v3] In-Reply-To: References: Message-ID: <0G0hE0eXCrNk47-BfgMM1quIUaa6tqI1gSzg9BFiKU0=.830956f8-8d09-47cc-854f-972d01d0f30c@github.com> > **Main changes:** > 1. `process_clusters()` now finds and processes contiguous ranges of dirty cards, skipping over contiguous ranges of clean cards. For reading the diffs, it might be easiest to look at the new code, rather than view the side-by-side diffs. > 2. the ShenandoahCardCluster class has been extended by a `block_start()` method which returns the first object in a card (which could be co-initial with the card); this method is used by the refactored `process_clusters()` above. > 3. ShenandoahCardCluster class's `has_object()` method has been renamed `starts_object()` which more closely reflects the API. > 4. ShenandoahCardStats class has been modified to better suit the way statistics are gathered in the rewritten `process_clusters()`. The larger-grain API should also result in less overhead for gathering the statistics and might (subject to measurement) allow it to be available in product/release builds (if so, that will be done in a separate follow-up ticket). > 5. Added some const annotations. > > **Testing & Implementation Notes:** > 6. Tested with Extremem and SpecJBB, fastdebug, release, and product builds, with and without verification enabled. > 7. Preliminary performance data with an Extremem workload show roughly 17-18% reduction in wall-clock durations of concurrent remembered set scanning across the distribution (p0, p25, p50, p75), p100 (max) was marginally down at 2%. The trend of the change was as expected since the gains are lost when we have a higher frequency of dirty/clean alternations with short dirty/clean runs. > 8. More performance data with SPECjbb and a different Extremem workload are being gathered, and will be included. We will also include the impact on the concurrent update refs phase, as well as the overall impact on latency scores. > > **Acknowledgments**: > 9. Many thanks to @kdnilsen for feedback on an earlier version of the draft PR, which helped catch a crucial misunderstanding on the role of TAMS and marked objects, and helped fix the error that had been dogging me. > > **Epilogue**: > 10. Further performance improvements are possible, but are deferred for follow-up. Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: A couple of guarantees to catch a pesky assert that's occasionally triggering. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/193/files - new: https://git.openjdk.org/shenandoah/pull/193/files/79b5f733..6c02e135 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=193&range=02 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=193&range=01-02 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/193.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/193/head:pull/193 PR: https://git.openjdk.org/shenandoah/pull/193 From jsjolen at openjdk.org Fri Jan 27 10:31:03 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 27 Jan 2023 10:31:03 GMT Subject: RFR: JDK-8301225: Replace NULL with nullptr in share/gc/shenandoah/ Message-ID: Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/shenandoah/. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. Here are some typical things to look out for: 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. An example of this: ```c++ // This function returns null void* ret_null(); // This function returns true if *x == nullptr bool is_nullptr(void** x); Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. Thanks! ------------- Commit messages: - Manual fixes - Merge remote-tracking branch 'origin/master' into JDK-8301225 - Replace NULL with nullptr in share/gc/shenandoah/ Changes: https://git.openjdk.org/jdk/pull/12251/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12251&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8301225 Stats: 536 lines in 60 files changed: 0 ins; 0 del; 536 mod Patch: https://git.openjdk.org/jdk/pull/12251.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12251/head:pull/12251 PR: https://git.openjdk.org/jdk/pull/12251 From jsjolen at openjdk.org Fri Jan 27 10:31:09 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 27 Jan 2023 10:31:09 GMT Subject: RFR: JDK-8301225: Replace NULL with nullptr in share/gc/shenandoah/ In-Reply-To: References: Message-ID: <9TLSuX9rc-Qhxmu6YJqUudeKiEr-_tdTV67GiTmGvbs=.190cfb7d-b12c-4c5d-a38a-a94125a76f36@github.com> On Fri, 27 Jan 2023 10:19:33 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/shenandoah/. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Manual stuff to fix src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp line 259: > 257: } > 258: > 259: // if (pre_val != null) nullptr src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp line 281: > 279: __ make_leaf_call(tf, CAST_FROM_FN_PTR(address, ShenandoahRuntime::write_ref_field_pre_entry), "shenandoah_wb_pre", pre_val, tls); > 280: } __ end_if(); // (!index) > 281: } __ end_if(); // (pre_val != null) nullptr src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp line 428: > 426: > 427: Node* one = __ ConI(1); > 428: // is_instof == 0 if base_oop == null nullptr src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp line 61: > 59: _worker_data[i] = nullptr; > 60: SHENANDOAH_PAR_PHASE_DO(,, SHENANDOAH_WORKER_DATA_nullptr) > 61: #undef SHENANDOAH_WORKER_DATA_nullptr Fix these macros ------------- PR: https://git.openjdk.org/jdk/pull/12251 From jsjolen at openjdk.org Fri Jan 27 13:36:16 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 27 Jan 2023 13:36:16 GMT Subject: RFR: JDK-8301225: Replace NULL with nullptr in share/gc/shenandoah/ In-Reply-To: References: Message-ID: On Fri, 27 Jan 2023 10:19:33 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/shenandoah/. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Passes tier1. ------------- PR: https://git.openjdk.org/jdk/pull/12251 From wkemper at openjdk.org Fri Jan 27 17:02:18 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 27 Jan 2023 17:02:18 GMT Subject: RFR: JDK-8301225: Replace NULL with nullptr in share/gc/shenandoah/ In-Reply-To: References: Message-ID: On Fri, 27 Jan 2023 10:19:33 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/shenandoah/. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! LGTM ------------- Marked as reviewed by wkemper (no project role). PR: https://git.openjdk.org/jdk/pull/12251 From kdnilsen at openjdk.org Fri Jan 27 17:30:33 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 27 Jan 2023 17:30:33 GMT Subject: RFR: JDK-8301225: Replace NULL with nullptr in share/gc/shenandoah/ In-Reply-To: References: Message-ID: On Fri, 27 Jan 2023 10:19:33 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/shenandoah/. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Looks good to me. Thanks for doing this. A few comments about copyright notices. Marked as reviewed by kdnilsen (no project role). src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2: > 1: /* > 2: * Copyright (c) 2023, 2023, Oracle and/or its affiliates. All rights reserved. minor glitch here src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 2: > 1: /* > 2: * Copyright (c) 2023, 2023, Oracle and/or its affiliates. All rights reserved. same glitch here src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.hpp line 2: > 1: /* > 2: * Copyright (c) 2018, 2023, Oracle and/or its affiliates. All rights reserved. Probably should be 2018, 2019, 2023 src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.cpp line 2: > 1: /* > 2: * Copyright (c) 2015, 2023, Oracle and/or its affiliates. All rights reserved. same here src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.hpp line 2: > 1: /* > 2: * Copyright (c) 2015, 2023, Oracle and/or its affiliates. All rights reserved. check this ------------- PR: https://git.openjdk.org/jdk/pull/12251 From ysr at openjdk.org Fri Jan 27 19:41:19 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 27 Jan 2023 19:41:19 GMT Subject: RFR: JDK-8301225: Replace NULL with nullptr in share/gc/shenandoah/ In-Reply-To: <9TLSuX9rc-Qhxmu6YJqUudeKiEr-_tdTV67GiTmGvbs=.190cfb7d-b12c-4c5d-a38a-a94125a76f36@github.com> References: <9TLSuX9rc-Qhxmu6YJqUudeKiEr-_tdTV67GiTmGvbs=.190cfb7d-b12c-4c5d-a38a-a94125a76f36@github.com> Message-ID: On Fri, 27 Jan 2023 10:23:27 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/shenandoah/. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp line 61: > >> 59: _worker_data[i] = nullptr; >> 60: SHENANDOAH_PAR_PHASE_DO(,, SHENANDOAH_WORKER_DATA_nullptr) >> 61: #undef SHENANDOAH_WORKER_DATA_nullptr > > Fix these macros My suggestion would be to leave the macro name unchanged, just change the definition. Or at least use uppercasing in the macro name. (Perhaps the latter's what your comment intended.) More generally, is there a `jcheck` rule that will prevent re-introduction of NULL usage into the mix? ------------- PR: https://git.openjdk.org/jdk/pull/12251 From wkemper at openjdk.org Fri Jan 27 21:21:16 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 27 Jan 2023 21:21:16 GMT Subject: RFR: JDK-8301225: Replace NULL with nullptr in share/gc/shenandoah/ In-Reply-To: References: <9TLSuX9rc-Qhxmu6YJqUudeKiEr-_tdTV67GiTmGvbs=.190cfb7d-b12c-4c5d-a38a-a94125a76f36@github.com> Message-ID: On Fri, 27 Jan 2023 19:38:45 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp line 61: >> >>> 59: _worker_data[i] = nullptr; >>> 60: SHENANDOAH_PAR_PHASE_DO(,, SHENANDOAH_WORKER_DATA_nullptr) >>> 61: #undef SHENANDOAH_WORKER_DATA_nullptr >> >> Fix these macros > > My suggestion would be to leave the macro name unchanged, just change the definition. Or at least use uppercasing in the macro name. (Perhaps the latter's what your comment intended.) > > More generally, is there a `jcheck` rule that will prevent re-introduction of NULL usage into the mix? This code diff is out of date, the macro name was reverted to the original in a subsequent commit. Good question about adding this to `jcheck`. ------------- PR: https://git.openjdk.org/jdk/pull/12251 From wkemper at openjdk.org Fri Jan 27 22:06:29 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 27 Jan 2023 22:06:29 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: This merges upstream tag jdk-21+7 ------------- Commit messages: - Merge tag 'jdk-21+7' into merge-jdk-21-7 - 8300806: Update googletest to v1.13.0 - 8300592: ASan build does not correctly propagate options to some test launchers - 8299635: Hotspot update for deprecated sprintf in Xcode 14 - 8300805: Update autoconf build-aux files with latest from 2022-09-17 - 8301086: jdk/internal/util/ByteArray/ReadWriteValues.java fails with CompilationError - 8300997: Add curl support to createJMHBundle.sh - 8295944: Move the Http2TestServer and related classes into a package of its own - 8301004: httpclient: Add more debug to HttpResponseInputStream - 8300236: Use VarHandle access in Data(Input | Output)Stream classes - ... and 102 more: https://git.openjdk.org/shenandoah/compare/1974f58a...988c99c0 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=shenandoah&pr=206&range=00.0 - openjdk/jdk:master: https://webrevs.openjdk.org/?repo=shenandoah&pr=206&range=00.1 Changes: https://git.openjdk.org/shenandoah/pull/206/files Stats: 21466 lines in 876 files changed: 9414 ins; 3760 del; 8292 mod Patch: https://git.openjdk.org/shenandoah/pull/206.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/206/head:pull/206 PR: https://git.openjdk.org/shenandoah/pull/206 From wkemper at openjdk.org Fri Jan 27 23:12:39 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 27 Jan 2023 23:12:39 GMT Subject: RFR: Revert unnecessary changes to unified logging Message-ID: These are vestigial changes from the initial approach to support logging for the region sampling. This initial approach was abandoned and these changes should have been reverted at that time. ------------- Commit messages: - Revert unnecessary changes to unified logging Changes: https://git.openjdk.org/shenandoah/pull/207/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=207&range=00 Stats: 28 lines in 2 files changed: 14 ins; 14 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/207.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/207/head:pull/207 PR: https://git.openjdk.org/shenandoah/pull/207 From wkemper at openjdk.org Fri Jan 27 23:16:52 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 27 Jan 2023 23:16:52 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Fri, 27 Jan 2023 21:59:32 GMT, William Kemper wrote: > This merges upstream tag jdk-21+7 This pull request has now been integrated. Changeset: 13b83414 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/13b83414e3b08d4fec8103f65fe68f47a5a1dc4d Stats: 21466 lines in 876 files changed: 9414 ins; 3760 del; 8292 mod Merge openjdk/jdk:master ------------- PR: https://git.openjdk.org/shenandoah/pull/206 From kdnilsen at openjdk.org Fri Jan 27 23:19:56 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 27 Jan 2023 23:19:56 GMT Subject: RFR: Revert unnecessary changes to unified logging In-Reply-To: References: Message-ID: On Fri, 27 Jan 2023 23:06:26 GMT, William Kemper wrote: > These are vestigial changes from the initial approach to support logging for the region sampling. This initial approach was abandoned and these changes should have been reverted at that time. Marked as reviewed by kdnilsen (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/207 From ysr at openjdk.org Fri Jan 27 23:24:49 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 27 Jan 2023 23:24:49 GMT Subject: RFR: Revert unnecessary changes to unified logging In-Reply-To: References: Message-ID: On Fri, 27 Jan 2023 23:06:26 GMT, William Kemper wrote: > These are vestigial changes from the initial approach to support logging for the region sampling. This initial approach was abandoned and these changes should have been reverted at that time. Marked as reviewed by ysr (Author). ------------- PR: https://git.openjdk.org/shenandoah/pull/207 From wkemper at openjdk.org Fri Jan 27 23:39:57 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 27 Jan 2023 23:39:57 GMT Subject: Integrated: Revert unnecessary changes to unified logging In-Reply-To: References: Message-ID: On Fri, 27 Jan 2023 23:06:26 GMT, William Kemper wrote: > These are vestigial changes from the initial approach to support logging for the region sampling. This initial approach was abandoned and these changes should have been reverted at that time. This pull request has now been integrated. Changeset: 60861ba4 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/60861ba46a259f58c457c672038d59fef63499cc Stats: 28 lines in 2 files changed: 14 ins; 14 del; 0 mod Revert unnecessary changes to unified logging Reviewed-by: kdnilsen, ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/207 From ysr at openjdk.org Sun Jan 29 21:28:37 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sun, 29 Jan 2023 21:28:37 GMT Subject: RFR: 8299703: GenShen: improvements in card scanning [v4] In-Reply-To: References: Message-ID: > **Main changes:** > 1. `process_clusters()` now finds and processes contiguous ranges of dirty cards, skipping over contiguous ranges of clean cards. For reading the diffs, it might be easiest to look at the new code, rather than view the side-by-side diffs. > 2. the ShenandoahCardCluster class has been extended by a `block_start()` method which returns the first object in a card (which could be co-initial with the card); this method is used by the refactored `process_clusters()` above. > 3. ShenandoahCardCluster class's `has_object()` method has been renamed `starts_object()` which more closely reflects the API. > 4. ShenandoahCardStats class has been modified to better suit the way statistics are gathered in the rewritten `process_clusters()`. The larger-grain API should also result in less overhead for gathering the statistics and might (subject to measurement) allow it to be available in product/release builds (if so, that will be done in a separate follow-up ticket). > 5. Added some const annotations. > > **Testing & Implementation Notes:** > 6. Tested with Extremem and SpecJBB, fastdebug, release, and product builds, with and without verification enabled. > 7. Preliminary performance data with an Extremem workload show roughly 17-18% reduction in wall-clock durations of concurrent remembered set scanning across the distribution (p0, p25, p50, p75), p100 (max) was marginally down at 2%. The trend of the change was as expected since the gains are lost when we have a higher frequency of dirty/clean alternations with short dirty/clean runs. > 8. More performance data with SPECjbb and a different Extremem workload are being gathered, and will be included. We will also include the impact on the concurrent update refs phase, as well as the overall impact on latency scores. > > **Acknowledgments**: > 9. Many thanks to @kdnilsen for feedback on an earlier version of the draft PR, which helped catch a crucial misunderstanding on the role of TAMS and marked objects, and helped fix the error that had been dogging me. > > **Epilogue**: > 10. Further performance improvements are possible, but are deferred for follow-up. Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 113 commits: - Merge branch 'master' into rs_scan - a const, some assertions, and avoid redundant scans for on-objArrays that straddle across card clusters (sic). - A couple of guarantees to catch a pesky assert that's occasionally triggering. - Merge branch 'master' into rs_scan - More const safety, some asserts, some comments. - Change type of loop variable to signed to allow correct termination for the case when start_card_index is 0. Nominal check for overflow when using signed type for card index. - Fix the direction of an address comparison, add a couple of assertions, and elaborate some comments. Passes heap verification handily now. - Fixes related to tams logic and iteration, block_start backwards walk loop, etc. from review feedback from @kdnilsen. More const safety, & elaboration of some comments. - TODO: marks the places identified in code walkthrough / review with @kdnilsen that need fixing up. These will be addressed in the next commit. - ... - ... and 103 more: https://git.openjdk.org/shenandoah/compare/60861ba4...22bbfe26 ------------- Changes: https://git.openjdk.org/shenandoah/pull/193/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=193&range=03 Stats: 842 lines in 12 files changed: 335 ins; 268 del; 239 mod Patch: https://git.openjdk.org/shenandoah/pull/193.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/193/head:pull/193 PR: https://git.openjdk.org/shenandoah/pull/193 From ysr at openjdk.org Sun Jan 29 21:56:43 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sun, 29 Jan 2023 21:56:43 GMT Subject: RFR: 8299703: GenShen: improvements in card scanning [v5] In-Reply-To: References: Message-ID: > **Main changes:** > 1. `process_clusters()` now finds and processes contiguous ranges of dirty cards, skipping over contiguous ranges of clean cards. For reading the diffs, it might be easiest to look at the new code, rather than view the side-by-side diffs. > 2. the ShenandoahCardCluster class has been extended by a `block_start()` method which returns the first object in a card (which could be co-initial with the card); this method is used by the refactored `process_clusters()` above. > 3. ShenandoahCardCluster class's `has_object()` method has been renamed `starts_object()` which more closely reflects the API. > 4. ShenandoahCardStats class has been modified to better suit the way statistics are gathered in the rewritten `process_clusters()`. The larger-grain API should also result in less overhead for gathering the statistics and might (subject to measurement) allow it to be available in product/release builds (if so, that will be done in a separate follow-up ticket). > 5. Added some const annotations. > > **Testing & Implementation Notes:** > 6. Tested with Extremem and SpecJBB, fastdebug, release, and product builds, with and without verification enabled. > 7. Preliminary performance data with an Extremem workload show roughly 17-18% reduction in wall-clock durations of concurrent remembered set scanning across the distribution (p0, p25, p50, p75), p100 (max) was marginally down at 2%. The trend of the change was as expected since the gains are lost when we have a higher frequency of dirty/clean alternations with short dirty/clean runs. > 8. More performance data with SPECjbb and a different Extremem workload are being gathered, and will be included. We will also include the impact on the concurrent update refs phase, as well as the overall impact on latency scores. > > **Acknowledgments**: > 9. Many thanks to @kdnilsen for feedback on an earlier version of the draft PR, which helped catch a crucial misunderstanding on the role of TAMS and marked objects, and helped fix the error that had been dogging me. > > **Epilogue**: > 10. Further performance improvements are possible, but are deferred for follow-up. Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: jcheck: tab ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/193/files - new: https://git.openjdk.org/shenandoah/pull/193/files/22bbfe26..f804468e Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=193&range=04 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=193&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/193.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/193/head:pull/193 PR: https://git.openjdk.org/shenandoah/pull/193 From wkemper at openjdk.org Mon Jan 30 22:45:39 2023 From: wkemper at openjdk.org (William Kemper) Date: Mon, 30 Jan 2023 22:45:39 GMT Subject: RFR: 8299324: inline_native_setCurrentThread lacks GC barrier for Shenandoah Message-ID: Allow Shenandoah barrier to emit the store barrier for native memory. I believe it is safe to delete the assert on L202 because `obj` is not used here. Tested with `hotspot:hotspot_gc` and `hotspot:loom` with `JAVA_OPTIONS=-XX:+UseShenandoahGC` (and again with -XX:TieredStopAtLevel=1). ------------- Commit messages: - Merge jdk:master into native-store-barrier - 8299324: inline_native_setCurrentThread lacks GC barrier for Shenandoah Changes: https://git.openjdk.org/jdk/pull/12300/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12300&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8299324 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12300.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12300/head:pull/12300 PR: https://git.openjdk.org/jdk/pull/12300 From jsjolen at openjdk.org Tue Jan 31 09:36:59 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 31 Jan 2023 09:36:59 GMT Subject: RFR: JDK-8301225: Replace NULL with nullptr in share/gc/shenandoah/ In-Reply-To: References: Message-ID: On Fri, 27 Jan 2023 17:11:40 GMT, Kelvin Nilsen wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/shenandoah/. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2023, 2023, Oracle and/or its affiliates. All rights reserved. > > minor glitch here Weird, first time I'm seeing this glitch. Thanks! > src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.hpp line 2: > >> 1: /* >> 2: * Copyright (c) 2018, 2023, Oracle and/or its affiliates. All rights reserved. > > Probably should be 2018, 2019, 2023 This and the other ones are correct, it's `$FirstChange, $LastChange, ` ------------- PR: https://git.openjdk.org/jdk/pull/12251 From kdnilsen at openjdk.org Tue Jan 31 15:06:55 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 31 Jan 2023 15:06:55 GMT Subject: RFR: 8299703: GenShen: improvements in card scanning [v5] In-Reply-To: References: Message-ID: On Sun, 29 Jan 2023 21:56:43 GMT, Y. Srinivas Ramakrishna wrote: >> **Main changes:** >> 1. `process_clusters()` now finds and processes contiguous ranges of dirty cards, skipping over contiguous ranges of clean cards. For reading the diffs, it might be easiest to look at the new code, rather than view the side-by-side diffs. >> 2. the ShenandoahCardCluster class has been extended by a `block_start()` method which returns the first object in a card (which could be co-initial with the card); this method is used by the refactored `process_clusters()` above. >> 3. ShenandoahCardCluster class's `has_object()` method has been renamed `starts_object()` which more closely reflects the API. >> 4. ShenandoahCardStats class has been modified to better suit the way statistics are gathered in the rewritten `process_clusters()`. The larger-grain API should also result in less overhead for gathering the statistics and might (subject to measurement) allow it to be available in product/release builds (if so, that will be done in a separate follow-up ticket). >> 5. Added some const annotations. >> >> **Testing & Implementation Notes:** >> 6. Tested with Extremem and SpecJBB, fastdebug, release, and product builds, with and without verification enabled. >> 7. Preliminary performance data with an Extremem workload showed roughly 17-18% reduction in wall-clock durations of concurrent remembered set scanning across the distribution (p0, p25, p50, p75), p100 (max) was marginally down at 2%. The trend of the change was as expected since the gains are lost when we have a higher frequency of dirty/clean alternations with short dirty/clean runs. >> 8. More performance data with SPECjbb and several different Extremem workloads were gathered, and can be found below, including both phases that use the process_clusters code. See https://github.com/openjdk/shenandoah/pull/193#issuecomment-1405191124 below. >> >> **Acknowledgments**: >> 9. Many thanks to @kdnilsen for feedback on an earlier version of the draft PR, which helped catch a crucial misunderstanding on the role of TAMS and marked objects, and helped fix the error that had been dogging me. >> >> **Epilogue**: >> 10. Further performance improvements are possible, but are deferred for follow-up. > > Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: > > jcheck: tab Marked as reviewed by kdnilsen (Committer). src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 975: > 973: > 974: size_t ShenandoahGeneration::adjust_available(intptr_t adjustment) { > 975: // TODO: ysr: revert to an assert Do you want to make these refinements before we integerate? src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 951: > 949: > 950: // ShenandoahRegionChunkIterator divides the total remembered set scanning effort into ShenandoahRegionChunks > 951: // that are assigned one at a time to worker threads. (Here, we use the terms`assignments` and `chunks` Typo: need a space before assignments ------------- PR: https://git.openjdk.org/shenandoah/pull/193 From kdnilsen at openjdk.org Tue Jan 31 15:06:58 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 31 Jan 2023 15:06:58 GMT Subject: RFR: 8299703: GenShen: improvements in card scanning [v5] In-Reply-To: References: Message-ID: On Tue, 31 Jan 2023 14:53:25 GMT, Kelvin Nilsen wrote: >> Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: >> >> jcheck: tab > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 975: > >> 973: >> 974: size_t ShenandoahGeneration::adjust_available(intptr_t adjustment) { >> 975: // TODO: ysr: revert to an assert > > Do you want to make these refinements before we integerate? Or maybe just remove them? ------------- PR: https://git.openjdk.org/shenandoah/pull/193 From wkemper at openjdk.org Tue Jan 31 17:39:57 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 31 Jan 2023 17:39:57 GMT Subject: RFR: Combine bitmap clearing with region resetting closure Message-ID: <6o4akBXd8cgdS7RZT9tCyfTgmz-FL3YvkPwTo40TkfM=.873acca8-7fc4-4f22-9439-42329342b7ad@github.com> This change combines the two closures used to `prepare_gc`. This removes a second iteration over the regions. ------------- Commit messages: - Combine bitmap clearing with region resetting closure Changes: https://git.openjdk.org/shenandoah/pull/208/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=208&range=00 Stats: 14 lines in 1 file changed: 8 ins; 1 del; 5 mod Patch: https://git.openjdk.org/shenandoah/pull/208.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/208/head:pull/208 PR: https://git.openjdk.org/shenandoah/pull/208 From ysr at openjdk.org Tue Jan 31 19:22:41 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 31 Jan 2023 19:22:41 GMT Subject: RFR: 8299703: GenShen: improvements in card scanning [v5] In-Reply-To: References: Message-ID: <9PzFIkRzUD2XozIfVni9oY5guaj771FVQhjjAsDmk3o=.8abeafd9-6020-4fcc-8c88-3f6e39a31ce9@github.com> On Tue, 31 Jan 2023 14:53:58 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 975: >> >>> 973: >>> 974: size_t ShenandoahGeneration::adjust_available(intptr_t adjustment) { >>> 975: // TODO: ysr: revert to an assert >> >> Do you want to make these refinements before we integerate? > > Or maybe just remove them? I'll change them to warnings. ------------- PR: https://git.openjdk.org/shenandoah/pull/193 From kdnilsen at openjdk.org Tue Jan 31 19:37:37 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 31 Jan 2023 19:37:37 GMT Subject: RFR: Combine bitmap clearing with region resetting closure In-Reply-To: <6o4akBXd8cgdS7RZT9tCyfTgmz-FL3YvkPwTo40TkfM=.873acca8-7fc4-4f22-9439-42329342b7ad@github.com> References: <6o4akBXd8cgdS7RZT9tCyfTgmz-FL3YvkPwTo40TkfM=.873acca8-7fc4-4f22-9439-42329342b7ad@github.com> Message-ID: On Tue, 31 Jan 2023 17:23:48 GMT, William Kemper wrote: > This change combines the two closures used to `prepare_gc`. This removes a second iteration over the regions. Marked as reviewed by kdnilsen (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/208 From ysr at openjdk.org Tue Jan 31 20:29:41 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 31 Jan 2023 20:29:41 GMT Subject: RFR: Combine bitmap clearing with region resetting closure In-Reply-To: <6o4akBXd8cgdS7RZT9tCyfTgmz-FL3YvkPwTo40TkfM=.873acca8-7fc4-4f22-9439-42329342b7ad@github.com> References: <6o4akBXd8cgdS7RZT9tCyfTgmz-FL3YvkPwTo40TkfM=.873acca8-7fc4-4f22-9439-42329342b7ad@github.com> Message-ID: On Tue, 31 Jan 2023 17:23:48 GMT, William Kemper wrote: > This change combines the two closures used to `prepare_gc`. This removes a second iteration over the regions. Looks good, modulo a comment re `reset_mark_bitmap`. Reviewed! src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 213: > 211: void ShenandoahGeneration::prepare_gc() { > 212: // Reset mark bitmap for this generation (typically young) > 213: reset_mark_bitmap(); Is `reset_mark_bitmap()` dead code and therefore removable now? Or, can all other usages also be removed in similar manner? ------------- Marked as reviewed by ysr (Author). PR: https://git.openjdk.org/shenandoah/pull/208 From wkemper at openjdk.org Tue Jan 31 20:42:22 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 31 Jan 2023 20:42:22 GMT Subject: RFR: Tune heuristic defaults and behavior for improved stability Message-ID: <4G8AQT9r0oKv4t98V5RTTS2KkRyRnlK1t7KH2nRO0FY=.cbb3343d-9452-4735-bb43-9e9ea7cf7946@github.com> Also, some minor changes to logging. ------------- Commit messages: - Fix whitespace - Replace magic numbers with symbolic constants - Tweaks to heuristics Changes: https://git.openjdk.org/shenandoah/pull/209/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=209&range=00 Stats: 32 lines in 6 files changed: 14 ins; 5 del; 13 mod Patch: https://git.openjdk.org/shenandoah/pull/209.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/209/head:pull/209 PR: https://git.openjdk.org/shenandoah/pull/209 From kdnilsen at openjdk.org Tue Jan 31 20:42:22 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 31 Jan 2023 20:42:22 GMT Subject: RFR: Tune heuristic defaults and behavior for improved stability In-Reply-To: <4G8AQT9r0oKv4t98V5RTTS2KkRyRnlK1t7KH2nRO0FY=.cbb3343d-9452-4735-bb43-9e9ea7cf7946@github.com> References: <4G8AQT9r0oKv4t98V5RTTS2KkRyRnlK1t7KH2nRO0FY=.cbb3343d-9452-4735-bb43-9e9ea7cf7946@github.com> Message-ID: On Tue, 31 Jan 2023 18:05:16 GMT, William Kemper wrote: > Also, some minor changes to logging. Thanks. Ok after whitespace fixes. ------------- Marked as reviewed by kdnilsen (Committer). PR: https://git.openjdk.org/shenandoah/pull/209 From wkemper at openjdk.org Tue Jan 31 20:59:29 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 31 Jan 2023 20:59:29 GMT Subject: Integrated: Tune heuristic defaults and behavior for improved stability In-Reply-To: <4G8AQT9r0oKv4t98V5RTTS2KkRyRnlK1t7KH2nRO0FY=.cbb3343d-9452-4735-bb43-9e9ea7cf7946@github.com> References: <4G8AQT9r0oKv4t98V5RTTS2KkRyRnlK1t7KH2nRO0FY=.cbb3343d-9452-4735-bb43-9e9ea7cf7946@github.com> Message-ID: <5gVXLxdjgcLI9ktJFVpNj6GsxEVVpB80c-JWUv-2mGI=.32908d8c-98ae-41b6-9b44-40fdd7e9fc2b@github.com> On Tue, 31 Jan 2023 18:05:16 GMT, William Kemper wrote: > Also, some minor changes to logging. This pull request has now been integrated. Changeset: 18132665 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/181326657c059ca497ea7c8610e0590dfb95e136 Stats: 32 lines in 6 files changed: 14 ins; 5 del; 13 mod Tune heuristic defaults and behavior for improved stability Reviewed-by: kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/209 From wkemper at openjdk.org Tue Jan 31 21:01:28 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 31 Jan 2023 21:01:28 GMT Subject: RFR: Combine bitmap clearing with region resetting closure In-Reply-To: References: <6o4akBXd8cgdS7RZT9tCyfTgmz-FL3YvkPwTo40TkfM=.873acca8-7fc4-4f22-9439-42329342b7ad@github.com> Message-ID: On Tue, 31 Jan 2023 20:26:01 GMT, Y. Srinivas Ramakrishna wrote: >> This change combines the two closures used to `prepare_gc`. This removes a second iteration over the regions. > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 213: > >> 211: void ShenandoahGeneration::prepare_gc() { >> 212: // Reset mark bitmap for this generation (typically young) >> 213: reset_mark_bitmap(); > > Is `reset_mark_bitmap()` dead code and therefore removable now? Or, can all other usages also be removed in similar manner? No, it's not dead. It's used by the full GC still. ------------- PR: https://git.openjdk.org/shenandoah/pull/208 From wkemper at openjdk.org Tue Jan 31 21:01:29 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 31 Jan 2023 21:01:29 GMT Subject: Integrated: Combine bitmap clearing with region resetting closure In-Reply-To: <6o4akBXd8cgdS7RZT9tCyfTgmz-FL3YvkPwTo40TkfM=.873acca8-7fc4-4f22-9439-42329342b7ad@github.com> References: <6o4akBXd8cgdS7RZT9tCyfTgmz-FL3YvkPwTo40TkfM=.873acca8-7fc4-4f22-9439-42329342b7ad@github.com> Message-ID: On Tue, 31 Jan 2023 17:23:48 GMT, William Kemper wrote: > This change combines the two closures used to `prepare_gc`. This removes a second iteration over the regions. This pull request has now been integrated. Changeset: 8a67ed0d Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/8a67ed0d7c79eefc748f18361bd80911dfbac85f Stats: 14 lines in 1 file changed: 8 ins; 1 del; 5 mod Combine bitmap clearing with region resetting closure Reviewed-by: kdnilsen, ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/208 From ysr at openjdk.org Tue Jan 31 21:37:28 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 31 Jan 2023 21:37:28 GMT Subject: RFR: Tune heuristic defaults and behavior for improved stability In-Reply-To: <4G8AQT9r0oKv4t98V5RTTS2KkRyRnlK1t7KH2nRO0FY=.cbb3343d-9452-4735-bb43-9e9ea7cf7946@github.com> References: <4G8AQT9r0oKv4t98V5RTTS2KkRyRnlK1t7KH2nRO0FY=.cbb3343d-9452-4735-bb43-9e9ea7cf7946@github.com> Message-ID: <9Ycj251p0-0WQPdeq4O3Rw7BujITdQkKXNa93XZ4wdc=.51e3692e-5a2e-45c1-beea-626444d9b04d@github.com> On Tue, 31 Jan 2023 18:05:16 GMT, William Kemper wrote: > Also, some minor changes to logging. LGTM, modulo question re (high level) performance data to share, if any. src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 150: > 148: range(0,100) \ > 149: \ > 150: product(uintx, ShenandoahLearningSteps, 10, EXPERIMENTAL, \ Relatedly, I'd suggest just removing the "(5)" in this comment in `shenandoahGeneration.cpp`: // Changing the size of the generation will reset the times learned for the heuristic. The heuristic will need to // relearn collection performance metrics. This also has the effect of preventing further capacity changes from the // heuristics until at least ShenandoahLearningSteps(5) number of cycles has completed. void increase_capacity(size_t increment); void decrease_capacity(size_t decrement); src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 184: > 182: "increases the sensitivity. ") \ > 183: \ > 184: product(double, ShenandoahAdaptiveDecayFactor, 0.1, EXPERIMENTAL, \ Is there performance data to share to inform this change? ------------- Marked as reviewed by ysr (Author). PR: https://git.openjdk.org/shenandoah/pull/209 From kdnilsen at openjdk.org Tue Jan 31 22:44:21 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 31 Jan 2023 22:44:21 GMT Subject: RFR: Age objects during degeneration Message-ID: <-qxgj9ESvphLh-GIzdlqvfjEP_78dNseWYE1AUfZjTc=.786aaabc-3f59-4af5-92b4-20f0926836c5@github.com> We found that not aging during degenerated cycles can cause large amounts of memory (e.g. 15 GB) to accumulate in young-gen when it ought to be promoted to old gen. This creates serious performance degradation. This patch allows degenerated cycles to age objects. ------------- Commit messages: - Merge remote-tracking branch 'GitFarmBranch/age-objects-during-degeneration' into age-objects-during-degeneration - Allow aging of objects during degenerated cycle Changes: https://git.openjdk.org/shenandoah/pull/210/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=210&range=00 Stats: 8 lines in 2 files changed: 2 ins; 5 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/210.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/210/head:pull/210 PR: https://git.openjdk.org/shenandoah/pull/210 From wkemper at openjdk.org Tue Jan 31 22:47:39 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 31 Jan 2023 22:47:39 GMT Subject: RFR: Age objects during degeneration In-Reply-To: <-qxgj9ESvphLh-GIzdlqvfjEP_78dNseWYE1AUfZjTc=.786aaabc-3f59-4af5-92b4-20f0926836c5@github.com> References: <-qxgj9ESvphLh-GIzdlqvfjEP_78dNseWYE1AUfZjTc=.786aaabc-3f59-4af5-92b4-20f0926836c5@github.com> Message-ID: On Tue, 31 Jan 2023 22:38:11 GMT, Kelvin Nilsen wrote: > We found that not aging during degenerated cycles can cause large amounts of memory (e.g. 15 GB) to accumulate in young-gen when it ought to be promoted to old gen. This creates serious performance degradation. This patch allows degenerated cycles to age objects. Looks good! ------------- Marked as reviewed by wkemper (Committer). PR: https://git.openjdk.org/shenandoah/pull/210 From kdnilsen at openjdk.org Tue Jan 31 22:50:34 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 31 Jan 2023 22:50:34 GMT Subject: Integrated: Age objects during degeneration In-Reply-To: <-qxgj9ESvphLh-GIzdlqvfjEP_78dNseWYE1AUfZjTc=.786aaabc-3f59-4af5-92b4-20f0926836c5@github.com> References: <-qxgj9ESvphLh-GIzdlqvfjEP_78dNseWYE1AUfZjTc=.786aaabc-3f59-4af5-92b4-20f0926836c5@github.com> Message-ID: <7evCOWQAHk8cat2dPKxlMx0cMdR4MCb0YVcj5ZLEIbI=.4174e491-17ff-4c13-9c9e-c441ab3ac945@github.com> On Tue, 31 Jan 2023 22:38:11 GMT, Kelvin Nilsen wrote: > We found that not aging during degenerated cycles can cause large amounts of memory (e.g. 15 GB) to accumulate in young-gen when it ought to be promoted to old gen. This creates serious performance degradation. This patch allows degenerated cycles to age objects. This pull request has now been integrated. Changeset: a2963b17 Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/a2963b17370cdd2d45b8b19d49e23e3791424499 Stats: 8 lines in 2 files changed: 2 ins; 5 del; 1 mod Age objects during degeneration Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/shenandoah/pull/210 From ysr at openjdk.org Tue Jan 31 23:01:26 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 31 Jan 2023 23:01:26 GMT Subject: RFR: Age objects during degeneration In-Reply-To: <-qxgj9ESvphLh-GIzdlqvfjEP_78dNseWYE1AUfZjTc=.786aaabc-3f59-4af5-92b4-20f0926836c5@github.com> References: <-qxgj9ESvphLh-GIzdlqvfjEP_78dNseWYE1AUfZjTc=.786aaabc-3f59-4af5-92b4-20f0926836c5@github.com> Message-ID: On Tue, 31 Jan 2023 22:38:11 GMT, Kelvin Nilsen wrote: > We found that not aging during degenerated cycles can cause large amounts of memory (e.g. 15 GB) to accumulate in young-gen when it ought to be promoted to old gen. This creates serious performance degradation. This patch allows degenerated cycles to age objects. LGTM. Looks like the right thing to do. Is there any comparative performance data to share? ------------- PR: https://git.openjdk.org/shenandoah/pull/210 From wkemper at openjdk.org Tue Jan 31 23:59:23 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 31 Jan 2023 23:59:23 GMT Subject: RFR: Tune heuristic defaults and behavior for improved stability In-Reply-To: <9Ycj251p0-0WQPdeq4O3Rw7BujITdQkKXNa93XZ4wdc=.51e3692e-5a2e-45c1-beea-626444d9b04d@github.com> References: <4G8AQT9r0oKv4t98V5RTTS2KkRyRnlK1t7KH2nRO0FY=.cbb3343d-9452-4735-bb43-9e9ea7cf7946@github.com> <9Ycj251p0-0WQPdeq4O3Rw7BujITdQkKXNa93XZ4wdc=.51e3692e-5a2e-45c1-beea-626444d9b04d@github.com> Message-ID: On Tue, 31 Jan 2023 21:32:31 GMT, Y. Srinivas Ramakrishna wrote: >> Also, some minor changes to logging. > > src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 150: > >> 148: range(0,100) \ >> 149: \ >> 150: product(uintx, ShenandoahLearningSteps, 10, EXPERIMENTAL, \ > > Relatedly, I'd suggest just removing the "(5)" in this comment in `shenandoahGeneration.cpp`: > > > // Changing the size of the generation will reset the times learned for the heuristic. The heuristic will need to > // relearn collection performance metrics. This also has the effect of preventing further capacity changes from the > // heuristics until at least ShenandoahLearningSteps(5) number of cycles has completed. > void increase_capacity(size_t increment); > void decrease_capacity(size_t decrement); That whole comment is out of date, I'll remove it. ------------- PR: https://git.openjdk.org/shenandoah/pull/209