From shade at openjdk.org Tue Jan 3 10:13:49 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jan 2023 10:13:49 GMT Subject: RFR: 8299072: java_lang_ref_Reference::clear_referent should be GC agnostic In-Reply-To: References: Message-ID: <6S2QZ9-NMeIQe0FUCdZlCKuWG8h_dwdFs9_sqMbJ6Ng=.4fe5ec3c-d4fc-4efe-ae19-5b9caf64a316@github.com> On Tue, 20 Dec 2022 07:05:34 GMT, Erik ?sterlund wrote: > The current java_lang_ref_Reference::clear_referent implementation performs a raw reference clear. That doesn't work well with upcoming GC algorithms. It should be made GC agnostic by going through the normal access API. Looks fine. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.org/jdk/pull/11736 From eosterlund at openjdk.org Tue Jan 3 15:40:48 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 3 Jan 2023 15:40:48 GMT Subject: RFR: 8299072: java_lang_ref_Reference::clear_referent should be GC agnostic In-Reply-To: <6S2QZ9-NMeIQe0FUCdZlCKuWG8h_dwdFs9_sqMbJ6Ng=.4fe5ec3c-d4fc-4efe-ae19-5b9caf64a316@github.com> References: <6S2QZ9-NMeIQe0FUCdZlCKuWG8h_dwdFs9_sqMbJ6Ng=.4fe5ec3c-d4fc-4efe-ae19-5b9caf64a316@github.com> Message-ID: On Tue, 3 Jan 2023 10:11:15 GMT, Aleksey Shipilev wrote: >> The current java_lang_ref_Reference::clear_referent implementation performs a raw reference clear. That doesn't work well with upcoming GC algorithms. It should be made GC agnostic by going through the normal access API. > > Looks fine. Thanks for the review @shipilev! ------------- PR: https://git.openjdk.org/jdk/pull/11736 From mdoerr at openjdk.org Tue Jan 3 15:58:49 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Jan 2023 15:58:49 GMT Subject: RFR: 8299312: Clean up BarrierSetNMethod In-Reply-To: References: Message-ID: <1km8O--i4urmIjKeFK2AT3mO0d4DoTX5RcPYC1XdD-k=.d82590fe-35d2-406f-a502-fd5bb2c145f5@github.com> On Fri, 23 Dec 2022 12:00:46 GMT, Erik ?sterlund wrote: > The terminology in BarrierSetNMethod is not crisp. In platform code we talk about a per-nmethod "guard value", but on shared level we call the same value arm value or disarm value in different contexts. But it really depends on the value whether the nmethod is disarmed or armed. We should embrace the "guard value" terminology and lift it in to the shared code level. > We also have more functionality than we need on platform level. The platform level only needs to know how to deoptimize, and how to set/get the guard value of an nmethod. The more specific functionality should be moved to the shared code and be expressed in terms of said setter/getter. With https://github.com/openjdk/jdk/commit/245f0cf4ac9dc655bfe2abb1c88c6ed1ddffd291, nmethod entry barriers are implemented on all platforms, now. The ARM32 parts should be added. (Also see failing pre-submit test.) ------------- PR: https://git.openjdk.org/jdk/pull/11774 From kdnilsen at openjdk.org Tue Jan 3 22:25:26 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 3 Jan 2023 22:25:26 GMT Subject: RFR: Allow heuristic trigger to increase capacity instead of running a collection In-Reply-To: References: Message-ID: On Fri, 30 Dec 2022 00:07:29 GMT, William Kemper wrote: > Before the adaptive heuristic starts a collection, it will attempt to increase the capacity of its generation. If the capacity is increased, the heuristic will re-evaluate the trigger criteria. > > There is also a change here to attempt to increase the size of the old generation in response to a promotion failure. Marked as reviewed by kdnilsen (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/190 From eosterlund at openjdk.org Wed Jan 4 14:50:20 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 4 Jan 2023 14:50:20 GMT Subject: RFR: 8299312: Clean up BarrierSetNMethod [v2] In-Reply-To: References: Message-ID: > The terminology in BarrierSetNMethod is not crisp. In platform code we talk about a per-nmethod "guard value", but on shared level we call the same value arm value or disarm value in different contexts. But it really depends on the value whether the nmethod is disarmed or armed. We should embrace the "guard value" terminology and lift it in to the shared code level. > We also have more functionality than we need on platform level. The platform level only needs to know how to deoptimize, and how to set/get the guard value of an nmethod. The more specific functionality should be moved to the shared code and be expressed in terms of said setter/getter. Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - ARM support - Merge branch 'master' into 8299312_barrier_set_nmethod_cleanup - Fix Shenandoah build - 8299312: Clean up BarrierSetNMethod ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11774/files - new: https://git.openjdk.org/jdk/pull/11774/files/78afd161..e0b32db3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11774&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11774&range=00-01 Stats: 9893 lines in 672 files changed: 5058 ins; 2615 del; 2220 mod Patch: https://git.openjdk.org/jdk/pull/11774.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11774/head:pull/11774 PR: https://git.openjdk.org/jdk/pull/11774 From eosterlund at openjdk.org Wed Jan 4 14:53:02 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 4 Jan 2023 14:53:02 GMT Subject: Integrated: 8299072: java_lang_ref_Reference::clear_referent should be GC agnostic In-Reply-To: References: Message-ID: On Tue, 20 Dec 2022 07:05:34 GMT, Erik ?sterlund wrote: > The current java_lang_ref_Reference::clear_referent implementation performs a raw reference clear. That doesn't work well with upcoming GC algorithms. It should be made GC agnostic by going through the normal access API. This pull request has now been integrated. Changeset: c32a34c2 Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/c32a34c2e534147bccf8320b095edda9e1088f5a Stats: 8 lines in 5 files changed: 5 ins; 0 del; 3 mod 8299072: java_lang_ref_Reference::clear_referent should be GC agnostic Co-authored-by: Axel Boldt-Christmas Reviewed-by: dholmes, shade, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/11736 From mdoerr at openjdk.org Wed Jan 4 15:43:57 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 4 Jan 2023 15:43:57 GMT Subject: RFR: 8299312: Clean up BarrierSetNMethod [v2] In-Reply-To: References:

Message-ID: <6kIYLSUVYIVvoKhLGGhYowSFyY09rWE07Tw4le5q2Bw=.90fed758-0136-4b5c-bd9f-73821c010930@github.com> On Wed, 4 Jan 2023 14:50:20 GMT, Erik ?sterlund wrote: >> The terminology in BarrierSetNMethod is not crisp. In platform code we talk about a per-nmethod "guard value", but on shared level we call the same value arm value or disarm value in different contexts. But it really depends on the value whether the nmethod is disarmed or armed. We should embrace the "guard value" terminology and lift it in to the shared code level. >> We also have more functionality than we need on platform level. The platform level only needs to know how to deoptimize, and how to set/get the guard value of an nmethod. The more specific functionality should be moved to the shared code and be expressed in terms of said setter/getter. > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - ARM support > - Merge branch 'master' into 8299312_barrier_set_nmethod_cleanup > - Fix Shenandoah build > - 8299312: Clean up BarrierSetNMethod LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.org/jdk/pull/11774 From kdnilsen at openjdk.org Wed Jan 4 16:20:21 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 4 Jan 2023 16:20:21 GMT Subject: RFR: Enforce that generation sizes align with region sizes Message-ID: For correctness, the size of each generation should be a multiple of the region size. A recent change violated this requirement. ------------- Commit messages: - Enforce that generation sizes align with region sizes Changes: https://git.openjdk.org/shenandoah/pull/191/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=191&range=00 Stats: 5 lines in 2 files changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/191.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/191/head:pull/191 PR: https://git.openjdk.org/shenandoah/pull/191 From ysr at openjdk.org Wed Jan 4 16:41:29 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 4 Jan 2023 16:41:29 GMT Subject: RFR: Enforce that generation sizes align with region sizes In-Reply-To: References: Message-ID: On Wed, 4 Jan 2023 16:12:35 GMT, Kelvin Nilsen wrote: > For correctness, the size of each generation should be a multiple of the region size. > > A recent change violated this requirement. LGTM. Please feel free to include testing notes, if any. Thanks! ------------- Marked as reviewed by ysr (Author). PR: https://git.openjdk.org/shenandoah/pull/191 From wkemper at openjdk.org Wed Jan 4 16:54:24 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 4 Jan 2023 16:54:24 GMT Subject: Integrated: Allow heuristic trigger to increase capacity instead of running a collection In-Reply-To: References: Message-ID: On Fri, 30 Dec 2022 00:07:29 GMT, William Kemper wrote: > Before the adaptive heuristic starts a collection, it will attempt to increase the capacity of its generation. If the capacity is increased, the heuristic will re-evaluate the trigger criteria. > > There is also a change here to attempt to increase the size of the old generation in response to a promotion failure. This pull request has now been integrated. Changeset: ba808494 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/ba808494675509a7ed2d97d08a9fbc971dbc0900 Stats: 97 lines in 10 files changed: 73 ins; 9 del; 15 mod Allow heuristic trigger to increase capacity instead of running a collection Reviewed-by: kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/190 From ysr at openjdk.org Wed Jan 4 16:59:20 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 4 Jan 2023 16:59:20 GMT Subject: RFR: Allow heuristic trigger to increase capacity instead of running a collection In-Reply-To: References: Message-ID: On Fri, 30 Dec 2022 00:07:29 GMT, William Kemper wrote: > Before the adaptive heuristic starts a collection, it will attempt to increase the capacity of its generation. If the capacity is increased, the heuristic will re-evaluate the trigger criteria. > > There is also a change here to attempt to increase the size of the old generation in response to a promotion failure. The changes look ok. I did wonder though if one might (for odd situations) reduce the number of recursions through `should_start_gc()` by having some notion of error that we are trying to correct when we call `resize_and_and_evaluate(/* pass in error or size differential here */)` from `should_start_gc()`. Anyway, just a thought for you to think about. Reviewed! ------------- Marked as reviewed by ysr (Author). PR: https://git.openjdk.org/shenandoah/pull/190 From kdnilsen at openjdk.org Wed Jan 4 17:22:29 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 4 Jan 2023 17:22:29 GMT Subject: RFR: Enforce that generation sizes align with region sizes In-Reply-To: References: Message-ID: <7IIQ4-uabP7uLKg8mql3JyrUeBjOstgpsix37p3MI0o=.a67b3d86-0129-4b3d-a616-79157cb3858c@github.com> On Wed, 4 Jan 2023 16:12:35 GMT, Kelvin Nilsen wrote: > For correctness, the size of each generation should be a multiple of the region size. > > A recent change violated this requirement. I discovered this problem because of an assertion failure in a separate branch that was based on mainline. I added the same assertion into this branch and verified through our internal pipeline regression testing that the two corrections to existing implementation resolve the assertion failure and do not introduce any other regressions. ------------- PR: https://git.openjdk.org/shenandoah/pull/191 From wkemper at openjdk.org Wed Jan 4 21:56:26 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 4 Jan 2023 21:56:26 GMT Subject: RFR: Enforce that generation sizes align with region sizes In-Reply-To: References: Message-ID: On Wed, 4 Jan 2023 16:12:35 GMT, Kelvin Nilsen wrote: > For correctness, the size of each generation should be a multiple of the region size. > > A recent change violated this requirement. Marked as reviewed by wkemper (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/191 From wkemper at openjdk.org Wed Jan 4 22:29:27 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 4 Jan 2023 22:29:27 GMT Subject: RFR: Allow heuristic trigger to increase capacity instead of running a collection In-Reply-To: References:

Message-ID: On Wed, 4 Jan 2023 16:56:54 GMT, Y. Srinivas Ramakrishna wrote: >> Before the adaptive heuristic starts a collection, it will attempt to increase the capacity of its generation. If the capacity is increased, the heuristic will re-evaluate the trigger criteria. >> >> There is also a change here to attempt to increase the size of the old generation in response to a promotion failure. > > The changes look ok. > > I did wonder though if one might (for odd situations) reduce the number of recursions through `should_start_gc()` by having some notion of error that we are trying to correct when we call `resize_and_and_evaluate(/* pass in error or size differential here */)` from `should_start_gc()`. Anyway, just a thought for you to think about. > > Reviewed! @ysramakrishna - changing the capacity of a generation will reset the `_gc_times_learned` field of the heuristics to zero. `resize_and_evaluate` will only resize (and recursive) if `_gc_times_learned` is not less than `ShenandoahLearningSteps`, so it will only really attempt to resize the generation once every `ShenandoahLearningSteps` number of cycles. ------------- PR: https://git.openjdk.org/shenandoah/pull/190 From kdnilsen at openjdk.org Wed Jan 4 23:26:17 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 4 Jan 2023 23:26:17 GMT Subject: Integrated: Enforce that generation sizes align with region sizes In-Reply-To: References: Message-ID: <_TCj5-a9ZmoQwaZDAVrcVZzyYcuvDUisnA5Ksrfejn0=.bc0ac1eb-6ca5-4e9d-857a-9311e65ea550@github.com> On Wed, 4 Jan 2023 16:12:35 GMT, Kelvin Nilsen wrote: > For correctness, the size of each generation should be a multiple of the region size. > > A recent change violated this requirement. This pull request has now been integrated. Changeset: 6daaa75a Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/6daaa75a34857998be5ac4dd53bdf0db289fd3a1 Stats: 5 lines in 2 files changed: 5 ins; 0 del; 0 mod Enforce that generation sizes align with region sizes Reviewed-by: ysr, wkemper ------------- PR: https://git.openjdk.org/shenandoah/pull/191 From kdnilsen at openjdk.org Thu Jan 5 01:01:50 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 01:01:50 GMT Subject: RFR: Fix allocate aligned Message-ID: An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. ------------- Commit messages: - Merge remote-tracking branch 'GitFarmBranch/fix-allocate-aligned-rebase' into fix-allocate-aligned - Remove instrumentation - Force min and max generation sizes to align with region boundaries - Debug verification error in old-gen used - Fix computation of padding requirement - Fix allocate_aligned padding Changes: https://git.openjdk.org/shenandoah/pull/192/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=192&range=00 Stats: 64 lines in 4 files changed: 44 ins; 7 del; 13 mod Patch: https://git.openjdk.org/shenandoah/pull/192.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/192/head:pull/192 PR: https://git.openjdk.org/shenandoah/pull/192 From kdnilsen at openjdk.org Thu Jan 5 01:16:42 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 01:16:42 GMT Subject: RFR: Fix allocate aligned [v2] In-Reply-To: References: Message-ID: > An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. > > In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Refinements during code review ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/192/files - new: https://git.openjdk.org/shenandoah/pull/192/files/e9c981a9..059f3ef5 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=192&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=192&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/192.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/192/head:pull/192 PR: https://git.openjdk.org/shenandoah/pull/192 From kdnilsen at openjdk.org Thu Jan 5 01:16:43 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 01:16:43 GMT Subject: RFR: Fix allocate aligned [v2] In-Reply-To: References:

Message-ID: On Thu, 5 Jan 2023 01:12:48 GMT, Kelvin Nilsen wrote: >> An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. >> >> In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Refinements during code review src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 66: > 64: // We don't need to register the PLAB. Its content will be registered as objects are allocated within it and/or > 65: // when the PLAB is retired. > 66: ShenandoahHeap::heap()->card_scan()->register_object(obj); In reviewing my own code, it looks like my implementation contradicts the comment. I'm going to retest without line 66. ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From wkemper at openjdk.org Thu Jan 5 01:24:19 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 5 Jan 2023 01:24:19 GMT Subject: RFR: Fix allocate aligned [v2] In-Reply-To: References:

Message-ID: On Thu, 5 Jan 2023 01:16:42 GMT, Kelvin Nilsen wrote: >> An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. >> >> In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Refinements during code review Changes requested by wkemper (Committer). src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 319: > 317: assert(req.actual_size() % CardTable::card_size_in_words() == 0, "PLAB start must align with card boundary"); > 318: assert(((uintptr_t) result) % CardTable::card_size_in_words() == 0, "PLAB start must align with card boundary"); > 319: if (result != nullptr && free > usable_free) { Line 315 asserts that `result` cannot be `nullptr`, do we need to check for non-null again here? ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From kdnilsen at openjdk.org Thu Jan 5 01:41:45 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 01:41:45 GMT Subject: RFR: Fix allocate aligned [v3] In-Reply-To: References: Message-ID: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> > An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. > > In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Remove redundant test for result != nullptr ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/192/files - new: https://git.openjdk.org/shenandoah/pull/192/files/059f3ef5..2b43663d Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=192&range=02 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=192&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/192.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/192/head:pull/192 PR: https://git.openjdk.org/shenandoah/pull/192 From kdnilsen at openjdk.org Thu Jan 5 01:41:48 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 01:41:48 GMT Subject: RFR: Fix allocate aligned [v2] In-Reply-To: References:

Message-ID: On Thu, 5 Jan 2023 01:20:59 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Refinements during code review > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 319: > >> 317: assert(req.actual_size() % CardTable::card_size_in_words() == 0, "PLAB start must align with card boundary"); >> 318: assert(((uintptr_t) result) % CardTable::card_size_in_words() == 0, "PLAB start must align with card boundary"); >> 319: if (result != nullptr && free > usable_free) { > > Line 315 asserts that `result` cannot be `nullptr`, do we need to check for non-null again here? Thanks for this catch. Making this change and testing on pipeline before integration. ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From kdnilsen at openjdk.org Thu Jan 5 01:41:49 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 01:41:49 GMT Subject: RFR: Fix allocate aligned [v3] In-Reply-To: References:

Message-ID: On Thu, 5 Jan 2023 01:10:08 GMT, Kelvin Nilsen wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove redundant test for result != nullptr > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp line 66: > >> 64: // We don't need to register the PLAB. Its content will be registered as objects are allocated within it and/or >> 65: // when the PLAB is retired. >> 66: ShenandoahHeap::heap()->card_scan()->register_object(obj); > > In reviewing my own code, it looks like my implementation contradicts the comment. I'm going to retest without line 66. Making this change and testing on regression suite pipeline before integration. ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From ysr at openjdk.org Thu Jan 5 08:14:17 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 5 Jan 2023 08:14:17 GMT Subject: RFR: Fix allocate aligned [v3] In-Reply-To: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> References: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> Message-ID: On Thu, 5 Jan 2023 01:41:45 GMT, Kelvin Nilsen wrote: >> An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. >> >> In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Remove redundant test for result != nullptr Changes look fine. I'd like to understand the original rationale for making PLAB boundaries exactly card-aligned. Perhaps it's described/documented somewhere in the code? (Something to do with simplifying card-scanning concurrently with allocating out of PLABs?) src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 317: > 315: assert(result != nullptr, "Allocation cannot fail"); > 316: assert(r->top() <= r->end(), "Allocation cannot span end of region"); > 317: assert(req.actual_size() % CardTable::card_size_in_words() == 0, "PLAB start must align with card boundary"); "PLAB should be card size multiple" (the next assert checks alignment) ------------- Marked as reviewed by ysr (Author). PR: https://git.openjdk.org/shenandoah/pull/192 From eosterlund at openjdk.org Thu Jan 5 13:08:48 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 5 Jan 2023 13:08:48 GMT Subject: RFR: 8299312: Clean up BarrierSetNMethod In-Reply-To: <1km8O--i4urmIjKeFK2AT3mO0d4DoTX5RcPYC1XdD-k=.d82590fe-35d2-406f-a502-fd5bb2c145f5@github.com> References: <1km8O--i4urmIjKeFK2AT3mO0d4DoTX5RcPYC1XdD-k=.d82590fe-35d2-406f-a502-fd5bb2c145f5@github.com> Message-ID: <_ONOHzwnJtl_l9se_WVzP2nP6dE0EXl61lTOCOt9qFA=.88dcba7c-2ca2-4746-9d55-863cf0635717@github.com> On Tue, 3 Jan 2023 15:55:43 GMT, Martin Doerr wrote: >> The terminology in BarrierSetNMethod is not crisp. In platform code we talk about a per-nmethod "guard value", but on shared level we call the same value arm value or disarm value in different contexts. But it really depends on the value whether the nmethod is disarmed or armed. We should embrace the "guard value" terminology and lift it in to the shared code level. >> We also have more functionality than we need on platform level. The platform level only needs to know how to deoptimize, and how to set/get the guard value of an nmethod. The more specific functionality should be moved to the shared code and be expressed in terms of said setter/getter. > > With https://github.com/openjdk/jdk/commit/245f0cf4ac9dc655bfe2abb1c88c6ed1ddffd291, nmethod entry barriers are implemented on all platforms, now. The ARM32 parts should be added. (Also see failing pre-submit test.) Thanks for the review @TheRealMDoerr! ------------- PR: https://git.openjdk.org/jdk/pull/11774 From kdnilsen at openjdk.org Thu Jan 5 14:19:24 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 14:19:24 GMT Subject: RFR: Fix allocate aligned [v3] In-Reply-To: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> References: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> Message-ID: On Thu, 5 Jan 2023 01:41:45 GMT, Kelvin Nilsen wrote: >> An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. >> >> In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Remove redundant test for result != nullptr After confirming that the two fixes motivated by review do not introduce regressions on our CI pipelines, I will close this with integration. ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From kdnilsen at openjdk.org Thu Jan 5 14:19:25 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 14:19:25 GMT Subject: RFR: Fix allocate aligned [v3] In-Reply-To: References: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> Message-ID: On Thu, 5 Jan 2023 07:59:06 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove redundant test for result != nullptr > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 317: > >> 315: assert(result != nullptr, "Allocation cannot fail"); >> 316: assert(r->top() <= r->end(), "Allocation cannot span end of region"); >> 317: assert(req.actual_size() % CardTable::card_size_in_words() == 0, "PLAB start must align with card boundary"); > > "PLAB should be card size multiple" > > (the next assert checks alignment) This allows us to register objects in PLABs without acquiring a lock. Otherwise, we need a lock because two threads might be registering objects within the same card in parallel. ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From wkemper at openjdk.org Thu Jan 5 16:45:25 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 5 Jan 2023 16:45:25 GMT Subject: RFR: Fix allocate aligned [v3] In-Reply-To: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> References: <8E4rSc7rGXBB29CxwRTSxPURFmEd0X3sfo-4eRhtFwg=.5dd4a519-da3c-4b54-8ff9-3ca8a750ee99@github.com> Message-ID: On Thu, 5 Jan 2023 01:41:45 GMT, Kelvin Nilsen wrote: >> An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. >> >> In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Remove redundant test for result != nullptr Marked as reviewed by wkemper (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From kdnilsen at openjdk.org Thu Jan 5 17:04:28 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 17:04:28 GMT Subject: Integrated: Fix allocate aligned In-Reply-To: References: Message-ID: <7-Ephr0bVwLOAptVv7m_Pzv6KZZUcCO4hbsJeuIyhds=.dad9d2c8-9ea3-4edf-bc08-97aafa28c32e@github.com> On Thu, 5 Jan 2023 00:55:14 GMT, Kelvin Nilsen wrote: > An error was discovered in the implementation and use of allocate_aligned(), which is used to allocate PLABs that align with remembered set card boundaries. In the previous implementation, if the required alignment padding was smaller than the minimum filler object, the additional card's memory worth of padding might cause the PLAB to span beyond the end of the selected heap region. This PR addresses the error. This code successfully runs our internal pipeline of tests without any regressions. > > In the current context, this code is not fully exercised due to very limited allocation of PLABs within heap regions that already hold previously allocated PLABs. This code has also been exercised in the context of code that more aggressively packs multiple PLABs into heap regions. That additional code will be integrated with a future PR. This pull request has now been integrated. Changeset: 7e9a1d49 Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/7e9a1d49eae7bff80e2b678f2402a4ecdf6c748f Stats: 63 lines in 4 files changed: 43 ins; 7 del; 13 mod Fix allocate aligned Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/192 From wkemper at openjdk.org Thu Jan 5 22:51:22 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 5 Jan 2023 22:51:22 GMT Subject: RFR: Fix use of uninitialized double Message-ID: Member field was not initialized in constructor ------------- Commit messages: - Fix use of uninitialized double Changes: https://git.openjdk.org/shenandoah/pull/194/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=194&range=00 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/194.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/194/head:pull/194 PR: https://git.openjdk.org/shenandoah/pull/194 From kdnilsen at openjdk.org Thu Jan 5 22:58:31 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 5 Jan 2023 22:58:31 GMT Subject: RFR: Fix use of uninitialized double In-Reply-To: References: Message-ID: <73DwV_svxjlyxYhcpnIwpeIWU5ibik3CSmLPbaPabnY=.56e9fe69-cc57-4372-9c97-ffdc7f7763cb@github.com> On Thu, 5 Jan 2023 22:43:38 GMT, William Kemper wrote: > Member field was not initialized in constructor Marked as reviewed by kdnilsen (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/194 From ysr at openjdk.org Thu Jan 5 23:01:33 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 5 Jan 2023 23:01:33 GMT Subject: RFR: Fix use of uninitialized double In-Reply-To: References: Message-ID: <2bCCxKH5djhzBOQjM5X2htQptGPKbfa0G6ek4yW2wLc=.fbd71380-0869-4ed4-9dcd-a860cb10070f@github.com> On Thu, 5 Jan 2023 22:43:38 GMT, William Kemper wrote: > Member field was not initialized in constructor Marked as reviewed by ysr (Author). ------------- PR: https://git.openjdk.org/shenandoah/pull/194 From wkemper at openjdk.org Thu Jan 5 23:05:22 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 5 Jan 2023 23:05:22 GMT Subject: Integrated: Fix use of uninitialized double In-Reply-To: References: Message-ID: On Thu, 5 Jan 2023 22:43:38 GMT, William Kemper wrote: > Member field was not initialized in constructor This pull request has now been integrated. Changeset: cb70d299 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/cb70d299998937138c03a2a7558fe1f6f3cdba0e Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Fix use of uninitialized double Reviewed-by: kdnilsen, ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/194 From sviswanathan at openjdk.org Fri Jan 6 19:48:56 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Jan 2023 19:48:56 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: References:

<6lAQI6kDDTGbskylHcWReX8ExaB6qkwgqoai7E6ikZY=.8a69a63c-453d-4bbd-8c76-4d477bfb77fe@github.com>

Message-ID: On Thu, 22 Dec 2022 13:10:02 GMT, Claes Redestad wrote: >> @cl4es Thanks for passing the constant node through, the code looks much cleaner now. The attached patch should handle the signed bytes/shorts as well. Please take a look. >> [signed.patch](https://github.com/openjdk/jdk/files/10273480/signed.patch) > > I ran tests and some quick microbenchmarking to validate @sviswa7's patch to activate vectorization for `short` and `byte` arrays and it looks good: > > Before: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 10000 avgt 5 7845.586 ? 23.440 ns/op > ArraysHashCode.chars 10000 avgt 5 1203.163 ? 11.995 ns/op > ArraysHashCode.ints 10000 avgt 5 1131.915 ? 7.843 ns/op > ArraysHashCode.multibytes 10000 avgt 5 4136.487 ? 5.790 ns/op > ArraysHashCode.multichars 10000 avgt 5 671.328 ? 17.629 ns/op > ArraysHashCode.multiints 10000 avgt 5 699.051 ? 8.135 ns/op > ArraysHashCode.multishorts 10000 avgt 5 4139.300 ? 10.633 ns/op > ArraysHashCode.shorts 10000 avgt 5 7844.019 ? 26.071 ns/op > > > After: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 10000 avgt 5 1193.208 ? 1.965 ns/op > ArraysHashCode.chars 10000 avgt 5 1193.311 ? 5.941 ns/op > ArraysHashCode.ints 10000 avgt 5 1132.592 ? 10.410 ns/op > ArraysHashCode.multibytes 10000 avgt 5 657.343 ? 25.343 ns/op > ArraysHashCode.multichars 10000 avgt 5 672.668 ? 5.229 ns/op > ArraysHashCode.multiints 10000 avgt 5 697.143 ? 3.929 ns/op > ArraysHashCode.multishorts 10000 avgt 5 666.738 ? 12.236 ns/op > ArraysHashCode.shorts 10000 avgt 5 1193.563 ? 5.449 ns/op @cl4es There seem to be failure on windows-x64 platform pre submit tests. Could you please take a look? ------------- PR: https://git.openjdk.org/jdk/pull/10847 From kdnilsen at openjdk.org Fri Jan 6 19:56:38 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 6 Jan 2023 19:56:38 GMT Subject: RFR: Plab fallback to minsize Message-ID: If there is not sufficient memory within a region to allocate the desired size of a PLAB, try to allocate a smaller PLAB that is still larger than the minimum PLAB size. This allows more efficient packing of PLABs into available old-gen heap regions. This patch has been exercised by an internal CI regression testing pipeline and has not introduced any regressions. Modest reductions in evacuation efforts are seen on benchmarks that do heavy promotion. This is presumably because there are fewer promotion failures, thus less need to repeatedly copy "old" objects within young-gen spaces. Of note, but not entirely explained, is the observation that some evacuation effort has shifted from mutator threads to GC worker threads. Some regression in jhiccup pause times were observed. This will be explored further after additional patches are integrated. x86 results: +49.22% extremem-phased/jhiccup_max_pause Control: 0.560ms (+/-467.41ms) 104600 Test: 0.835ms (+/-466.68ms) 12291 +28.58% hyperalloc_a2048_o1536/context_switch_count Control: 3945.242 (+/-1228.11 ) 82 Test: 5072.625 (+/-427.15 ) 10 +20.21% extremem-phased/cpu_user Control: 2323.752s (+/-592.18s ) 85 Test: 2793.378s (+/- 74.48s ) 10 +19.23% hyperalloc_a2048_o1536/jhiccup_max_pause Control: 0.613ms (+/- 1.32ms) 9840 Test: 0.731ms (+/- 1.40ms) 1200 +16.37% hyperalloc_a2048_o1536/cpu_user Control: 372.456s (+/- 67.33s ) 82 Test: 433.434s (+/- 10.54s ) 10 +13.72% extremem/worker_objects Control: 228174.968 (+/-274656.55 ) 47085 Test: 259476.534 (+/-300553.34 ) 3012 +12.87% xalan/concurrent_marking Control: 9.055ms (+/- 6.82ms) 7640 Test: 10.221ms (+/- 6.55ms) 561 +11.57% specjbb2015/pause_degenerated_gc_n Control: 921.116ms (+/-627.42ms) 6423 Test: 1.028s (+/-654.58ms) 782 +11.56% specjbb2015/pause_degenerated_gc_g Control: 923.790ms (+/-629.01ms) 6423 Test: 1.031s (+/-656.16ms) 782 -206.91% extremem/mutator_evacuated Control: 128640.328 (+/-1870500.79 ) 47085 Test: 41914.998 (+/-2374222.03 ) 3012 -148.23% hyperalloc_a2048_o1536/mutator_evacuated Control: 1755.404 (+/-3710823.53 ) 18356 Test: 707.164 (+/-3055167.84 ) 3374 -120.26% hyperalloc_a2048_o1536/mutator_objects Control: 6.337 (+/-14949.35 ) 18356 Test: 2.877 (+/-12293.11 ) 3374 -83.34% extremem/mutator_objects Control: 1834.425 (+/-7838.57 ) 47085 Test: 1000.564 (+/-9461.24 ) 3012 -62.39% extremem/concurrent_thread_roots Control: 3.298ms (+/- 5.49ms) 3408 Test: 2.031ms (+/- 4.06ms) 650 -60.85% hyperalloc_a2048_o1536/concurrent_evacuation Control: 11.389ms (+/- 33.55ms) 5943 Test: 7.081ms (+/- 30.34ms) 986 -59.21% hyperalloc_a3072_o1536/concurrent_evacuation Control: 6.744ms (+/- 29.91ms) 9195 Test: 4.236ms (+/- 27.88ms) 1279 -55.74% hyperalloc_a3072_o1536/mutator_evacuated Control: 995.822 (+/-3008122.97 ) 31761 Test: 639.419 (+/-2714463.89 ) 4564 -48.55% hyperalloc_a3072_o1536/mutator_objects Control: 4.018 (+/-12124.35 ) 31761 Test: 2.705 (+/-10918.88 ) 4564 -47.30% extremem/concurrent_mark_roots Control: 3.114ms (+/- 5.37ms) 4065 Test: 2.114ms (+/- 4.21ms) 674 aarch64 results: +20.28% extremem-phased/jhiccup_max_pause Control: 0.391ms (+/-778.03ms) 101414 Test: 0.471ms (+/-506.86ms) 12270 +16.56% xalan/jhiccup_max_pause Control: 2.806ms (+/- 3.57ms) 4943 Test: 3.271ms (+/- 3.60ms) 586 +12.43% hyperalloc_a2048_o1536/cpu_user Control: 387.646s (+/- 62.67s ) 81 Test: 435.835s (+/- 10.43s ) 10 +11.45% specjbb2015/pause_degenerated_gc_n Control: 1.305s (+/-855.27ms) 6504 Test: 1.455s (+/-874.27ms) 840 +11.43% specjbb2015/pause_degenerated_gc_g Control: 1.309s (+/-857.67ms) 6504 Test: 1.459s (+/-876.46ms) 840 +10.42% extremem-phased/cpu_user Control: 3285.986s (+/-822.03s ) 85 Test: 3628.380s (+/- 99.52s ) 10 -206.42% extremem/mutator_evacuated Control: 215404.352 (+/-1220778.32 ) 24514 Test: 70298.169 (+/-909463.44 ) 3011 -156.88% extremem/mutator_objects Control: 4341.422 (+/-15179.61 ) 24514 Test: 1690.048 (+/-8677.87 ) 3011 -117.99% hyperalloc_a2048_o1536/mutator_evacuated Control: 1086.398 (+/-2790294.87 ) 18643 Test: 498.373 (+/-2253567.56 ) 3318 -101.55% hyperalloc_a2048_o1536/mutator_objects Control: 4.146 (+/-11259.63 ) 18643 Test: 2.057 (+/-9112.94 ) 3318 -70.56% hyperalloc_a3072_o1536/concurrent_evacuation Control: 6.078ms (+/- 24.65ms) 8977 Test: 3.563ms (+/- 22.93ms) 1258 -65.62% extremem-phased/reconstruct_remembered_set Control: 190.987ms (+/-136.41ms) 1305 Test: 115.314ms (+/-154.77ms) 96 -56.62% hyperalloc_a3072_o1536/mutator_evacuated Control: 628.225 (+/-2364040.76 ) 31963 Test: 401.114 (+/-2138629.77 ) 4629 -51.32% hyperalloc_a3072_o1536/mutator_objects Control: 2.598 (+/-9519.49 ) 31963 Test: 1.717 (+/-8639.65 ) 4629 -48.41% extremem/concurrent_update_thread_roots Control: 4.723ms (+/- 10.75ms) 5357 Test: 3.182ms (+/- 8.55ms) 658 -48.04% hyperalloc_a2048_o1536/concurrent_evacuation Control: 9.702ms (+/- 27.76ms) 6218 Test: 6.553ms (+/- 24.74ms) 969 ------------- Commit messages: - Remove instrumentation and fix miscalculations in allocate_aligned - Fix bugs when downsizing PLAB allocation request - allocate_aligned tries smaller size if insufficient memory for full size Changes: https://git.openjdk.org/shenandoah/pull/195/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=195&range=00 Stats: 66 lines in 3 files changed: 38 ins; 9 del; 19 mod Patch: https://git.openjdk.org/shenandoah/pull/195.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/195/head:pull/195 PR: https://git.openjdk.org/shenandoah/pull/195 From wkemper at openjdk.org Fri Jan 6 21:13:31 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 6 Jan 2023 21:13:31 GMT Subject: RFR: Plab fallback to minsize In-Reply-To: References: Message-ID: <5Sgt8OMFtOTyAmy6pmSkm-wBSFwJE3spNvnVRM-Qnr4=.06834440-7772-4ccc-9ddb-e020de0c295b@github.com> On Fri, 6 Jan 2023 19:45:14 GMT, Kelvin Nilsen wrote: > If there is not sufficient memory within a region to allocate the desired size of a PLAB, try to allocate a smaller PLAB that is still larger than the minimum PLAB size. This allows more efficient packing of PLABs into available old-gen heap regions. > > This patch has been exercised by an internal CI regression testing pipeline and has not introduced any regressions. > > Modest reductions in evacuation efforts are seen on benchmarks that do heavy promotion. This is presumably because there are fewer promotion failures, thus less need to repeatedly copy "old" objects within young-gen spaces. Of note, but not entirely explained, is the observation that some evacuation effort has shifted from mutator threads to GC worker threads. > > Some regression in jhiccup pause times were observed. This will be explored further after additional patches are integrated. The evacuation metrics between mutators and gc workers are fairly unstable - likely because it depends so much on when and which threads get scheduled. I've been thinking of masking them in the reports for this reason. ------------- PR: https://git.openjdk.org/shenandoah/pull/195 From wkemper at openjdk.org Fri Jan 6 21:29:25 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 6 Jan 2023 21:29:25 GMT Subject: RFR: Plab fallback to minsize In-Reply-To: References: Message-ID: On Fri, 6 Jan 2023 19:45:14 GMT, Kelvin Nilsen wrote: > If there is not sufficient memory within a region to allocate the desired size of a PLAB, try to allocate a smaller PLAB that is still larger than the minimum PLAB size. This allows more efficient packing of PLABs into available old-gen heap regions. > > This patch has been exercised by an internal CI regression testing pipeline and has not introduced any regressions. > > Modest reductions in evacuation efforts are seen on benchmarks that do heavy promotion. This is presumably because there are fewer promotion failures, thus less need to repeatedly copy "old" objects within young-gen spaces. Of note, but not entirely explained, is the observation that some evacuation effort has shifted from mutator threads to GC worker threads. > > Some regression in jhiccup pause times were observed. This will be explored further after additional patches are integrated. Marked as reviewed by wkemper (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/195 From ysr at openjdk.org Fri Jan 6 21:29:25 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 6 Jan 2023 21:29:25 GMT Subject: RFR: Plab fallback to minsize In-Reply-To: References: Message-ID: <4yM1KvsDrl2tZq_vsc4Yu64lidm2Tuxf1wWOBrkYfhY=.2657559b-6fa7-4ba4-abd9-eddfff4ec546@github.com> On Fri, 6 Jan 2023 19:45:14 GMT, Kelvin Nilsen wrote: > If there is not sufficient memory within a region to allocate the desired size of a PLAB, try to allocate a smaller PLAB that is still larger than the minimum PLAB size. This allows more efficient packing of PLABs into available old-gen heap regions. > > This patch has been exercised by an internal CI regression testing pipeline and has not introduced any regressions. > > Modest reductions in evacuation efforts are seen on benchmarks that do heavy promotion. This is presumably because there are fewer promotion failures, thus less need to repeatedly copy "old" objects within young-gen spaces. Of note, but not entirely explained, is the observation that some evacuation effort has shifted from mutator threads to GC worker threads. > > Some regression in jhiccup pause times were observed. This will be explored further after additional patches are integrated. Changes look good, modulo general comments for longer-term consideration. Generally looks good, but I wonder if one should more carefully define the pre- and post-conditions of the two allocate methods to avoid duplicated computation between them (especially wrt minimum size etc.) One way to achieve that would be to have more specialized allocate methods that are called by subsets of clients. Having a leaf method called by several could lead to such duplication of checks. e.g. I see a bunch of "result != null" for values returned from a method that does checks and trimming of its own. If so, the checks in the leaf method for that caller may be wasteful. This is a general comment, but I'll look more carefully at the code to understand this better. One question: do PLAB requests that give you smaller PLABs slow down all subsequent PLAB requests in that region? Does this then result in a donwsizing of PLAB requests in the same cycle or subsequent ones? (I guess I am asking how often / what cycle PLAB resizing happens.) ------------- Marked as reviewed by ysr (Author). PR: https://git.openjdk.org/shenandoah/pull/195 From kdnilsen at openjdk.org Fri Jan 6 21:40:18 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 6 Jan 2023 21:40:18 GMT Subject: RFR: Plab fallback to minsize In-Reply-To: References: Message-ID: On Fri, 6 Jan 2023 19:45:14 GMT, Kelvin Nilsen wrote: > If there is not sufficient memory within a region to allocate the desired size of a PLAB, try to allocate a smaller PLAB that is still larger than the minimum PLAB size. This allows more efficient packing of PLABs into available old-gen heap regions. > > This patch has been exercised by an internal CI regression testing pipeline and has not introduced any regressions. > > Modest reductions in evacuation efforts are seen on benchmarks that do heavy promotion. This is presumably because there are fewer promotion failures, thus less need to repeatedly copy "old" objects within young-gen spaces. Of note, but not entirely explained, is the observation that some evacuation effort has shifted from mutator threads to GC worker threads. > > Some regression in jhiccup pause times were observed. This will be explored further after additional patches are integrated. The code for sizing PLAB requests is in allocate_from_plab_slow(). In general, each thread starts out with PLABs of size PLAB::min_size(). Each time the thread exhausts its existing PLAB, it tries to allocate a new PLAB that's twice as large as its previously preferred PLAB size, even if its previous PLAB is smaller than its previously preferred PLAB size. The consequence of downsizing a particular PLAB is that the thread will end up depleting the downsized PLAB more quickly than normal, which will result in this thread subsequently receiving an even larger PLAB sooner. ------------- PR: https://git.openjdk.org/shenandoah/pull/195 From kdnilsen at openjdk.org Fri Jan 6 21:46:32 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 6 Jan 2023 21:46:32 GMT Subject: RFR: Plab fallback to minsize In-Reply-To: References: Message-ID: On Fri, 6 Jan 2023 19:45:14 GMT, Kelvin Nilsen wrote: > If there is not sufficient memory within a region to allocate the desired size of a PLAB, try to allocate a smaller PLAB that is still larger than the minimum PLAB size. This allows more efficient packing of PLABs into available old-gen heap regions. > > This patch has been exercised by an internal CI regression testing pipeline and has not introduced any regressions. > > Modest reductions in evacuation efforts are seen on benchmarks that do heavy promotion. This is presumably because there are fewer promotion failures, thus less need to repeatedly copy "old" objects within young-gen spaces. Of note, but not entirely explained, is the observation that some evacuation effort has shifted from mutator threads to GC worker threads. > > Some regression in jhiccup pause times were observed. This will be explored further after additional patches are integrated. You make a good observation about the redundant checking between the caller and callee functions here. I agree that it would be good to eventually tighten up the API specs so that we don't need this redundancy. In the meanwhile, I note that allocate_aligned() is generally in-lined into the caller's context. This allows the compiler to optimize away at least some of the redundant checks. ------------- PR: https://git.openjdk.org/shenandoah/pull/195 From kdnilsen at openjdk.org Fri Jan 6 21:46:32 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 6 Jan 2023 21:46:32 GMT Subject: Integrated: Plab fallback to minsize In-Reply-To: References: Message-ID: <7OKSCFQrZqCDERPfO7t6Y85oCVmEpUC_8a9ZSrikcjg=.8b141e13-7815-4cf8-9d1d-e1d4035d341b@github.com> On Fri, 6 Jan 2023 19:45:14 GMT, Kelvin Nilsen wrote: > If there is not sufficient memory within a region to allocate the desired size of a PLAB, try to allocate a smaller PLAB that is still larger than the minimum PLAB size. This allows more efficient packing of PLABs into available old-gen heap regions. > > This patch has been exercised by an internal CI regression testing pipeline and has not introduced any regressions. > > Modest reductions in evacuation efforts are seen on benchmarks that do heavy promotion. This is presumably because there are fewer promotion failures, thus less need to repeatedly copy "old" objects within young-gen spaces. Of note, but not entirely explained, is the observation that some evacuation effort has shifted from mutator threads to GC worker threads. > > Some regression in jhiccup pause times were observed. This will be explored further after additional patches are integrated. This pull request has now been integrated. Changeset: c5774a09 Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/c5774a09b35a01c5ab52831a63072d0b753afd64 Stats: 66 lines in 3 files changed: 38 ins; 9 del; 19 mod Plab fallback to minsize Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/195 From wkemper at openjdk.org Sat Jan 7 00:22:41 2023 From: wkemper at openjdk.org (William Kemper) Date: Sat, 7 Jan 2023 00:22:41 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: <4-PjqKekTb5CuO6T_TRabJOAnG172XlsQqjsRSL16Rw=.811f7125-57ce-4551-94cf-945e2e3d939a@github.com> Merges tag jdk-21+4 ------------- Commit messages: - Merge tag 'jdk-21+4' into merge-jdk-21-4 - 8299439: java/text/Format/NumberFormat/CurrencyFormat.java fails for hr_HR - 8299563: Fix typos - 8219810: javac throws NullPointerException - 8200610: Compiling fails with java.nio.file.ReadOnlyFileSystemException - Merge - 8299476: PPC64 Zero build fails after JDK-8286302 - 8293824: gc/whitebox/TestConcMarkCycleWB.java failed "RuntimeException: assertTrue: expected true, was false" - 8299483: ProblemList java/text/Format/NumberFormat/CurrencyFormat.java - 8298324: Unable to run shell test with make - ... and 72 more: https://git.openjdk.org/shenandoah/compare/c5774a09...55fe3430 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=shenandoah&pr=196&range=00.0 - openjdk/jdk:master: https://webrevs.openjdk.org/?repo=shenandoah&pr=196&range=00.1 Changes: https://git.openjdk.org/shenandoah/pull/196/files Stats: 5313 lines in 488 files changed: 2252 ins; 1955 del; 1106 mod Patch: https://git.openjdk.org/shenandoah/pull/196.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/196/head:pull/196 PR: https://git.openjdk.org/shenandoah/pull/196 From fyang at openjdk.org Sat Jan 7 10:11:53 2023 From: fyang at openjdk.org (Fei Yang) Date: Sat, 7 Jan 2023 10:11:53 GMT Subject: RFR: 8299312: Clean up BarrierSetNMethod [v2] In-Reply-To: References:

Message-ID: On Wed, 4 Jan 2023 14:50:20 GMT, Erik ?sterlund wrote: >> The terminology in BarrierSetNMethod is not crisp. In platform code we talk about a per-nmethod "guard value", but on shared level we call the same value arm value or disarm value in different contexts. But it really depends on the value whether the nmethod is disarmed or armed. We should embrace the "guard value" terminology and lift it in to the shared code level. >> We also have more functionality than we need on platform level. The platform level only needs to know how to deoptimize, and how to set/get the guard value of an nmethod. The more specific functionality should be moved to the shared code and be expressed in terms of said setter/getter. > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - ARM support > - Merge branch 'master' into 8299312_barrier_set_nmethod_cleanup > - Fix Shenandoah build > - 8299312: Clean up BarrierSetNMethod Looks good to me. src/hotspot/share/runtime/thread.hpp line 118: > 116: // On AArch64, the high order 32 bits are used by a "patching epoch" number > 117: // which reflects if this thread has executed the required fences, after > 118: // an nmethod gets disarmed. The low order 32 bit denote the disarmed value. Nit: I think this should be: "The low order 32 bits denote the disarmed value." instead of: "The low order 32 bit denote the disarmed value." ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/11774 From eosterlund at openjdk.org Mon Jan 9 09:49:55 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Jan 2023 09:49:55 GMT Subject: RFR: 8299312: Clean up BarrierSetNMethod [v2] In-Reply-To: References:

Message-ID: On Sat, 7 Jan 2023 10:08:36 GMT, Fei Yang wrote: >> Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - ARM support >> - Merge branch 'master' into 8299312_barrier_set_nmethod_cleanup >> - Fix Shenandoah build >> - 8299312: Clean up BarrierSetNMethod > > Looks good to me. Thanks for the review @RealFYang! > src/hotspot/share/runtime/thread.hpp line 118: > >> 116: // On AArch64, the high order 32 bits are used by a "patching epoch" number >> 117: // which reflects if this thread has executed the required fences, after >> 118: // an nmethod gets disarmed. The low order 32 bit denote the disarmed value. > > Nit: > I think this should be: > "The low order 32 bits denote the disarmed value." > instead of: > "The low order 32 bit denote the disarmed value." Yes, you are right, thanks. ------------- PR: https://git.openjdk.org/jdk/pull/11774 From eosterlund at openjdk.org Mon Jan 9 09:54:12 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Jan 2023 09:54:12 GMT Subject: RFR: 8299312: Clean up BarrierSetNMethod [v3] In-Reply-To: References: Message-ID: > The terminology in BarrierSetNMethod is not crisp. In platform code we talk about a per-nmethod "guard value", but on shared level we call the same value arm value or disarm value in different contexts. But it really depends on the value whether the nmethod is disarmed or armed. We should embrace the "guard value" terminology and lift it in to the shared code level. > We also have more functionality than we need on platform level. The platform level only needs to know how to deoptimize, and how to set/get the guard value of an nmethod. The more specific functionality should be moved to the shared code and be expressed in terms of said setter/getter. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Nit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11774/files - new: https://git.openjdk.org/jdk/pull/11774/files/e0b32db3..08a1fb25 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11774&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11774&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11774.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11774/head:pull/11774 PR: https://git.openjdk.org/jdk/pull/11774 From eosterlund at openjdk.org Mon Jan 9 13:38:00 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Jan 2023 13:38:00 GMT Subject: Integrated: 8299312: Clean up BarrierSetNMethod In-Reply-To: References: Message-ID: On Fri, 23 Dec 2022 12:00:46 GMT, Erik ?sterlund wrote: > The terminology in BarrierSetNMethod is not crisp. In platform code we talk about a per-nmethod "guard value", but on shared level we call the same value arm value or disarm value in different contexts. But it really depends on the value whether the nmethod is disarmed or armed. We should embrace the "guard value" terminology and lift it in to the shared code level. > We also have more functionality than we need on platform level. The platform level only needs to know how to deoptimize, and how to set/get the guard value of an nmethod. The more specific functionality should be moved to the shared code and be expressed in terms of said setter/getter. This pull request has now been integrated. Changeset: 4ba81221 Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/4ba8122197e912db4894ed7fe395a8842268fbef Stats: 175 lines in 29 files changed: 10 ins; 82 del; 83 mod 8299312: Clean up BarrierSetNMethod Reviewed-by: mdoerr, fyang ------------- PR: https://git.openjdk.org/jdk/pull/11774 From redestad at openjdk.org Mon Jan 9 15:23:58 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 9 Jan 2023 15:23:58 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: References:

<6lAQI6kDDTGbskylHcWReX8ExaB6qkwgqoai7E6ikZY=.8a69a63c-453d-4bbd-8c76-4d477bfb77fe@github.com>

Message-ID: On Thu, 22 Dec 2022 13:10:02 GMT, Claes Redestad wrote: >> @cl4es Thanks for passing the constant node through, the code looks much cleaner now. The attached patch should handle the signed bytes/shorts as well. Please take a look. >> [signed.patch](https://github.com/openjdk/jdk/files/10273480/signed.patch) > > I ran tests and some quick microbenchmarking to validate @sviswa7's patch to activate vectorization for `short` and `byte` arrays and it looks good: > > Before: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 10000 avgt 5 7845.586 ? 23.440 ns/op > ArraysHashCode.chars 10000 avgt 5 1203.163 ? 11.995 ns/op > ArraysHashCode.ints 10000 avgt 5 1131.915 ? 7.843 ns/op > ArraysHashCode.multibytes 10000 avgt 5 4136.487 ? 5.790 ns/op > ArraysHashCode.multichars 10000 avgt 5 671.328 ? 17.629 ns/op > ArraysHashCode.multiints 10000 avgt 5 699.051 ? 8.135 ns/op > ArraysHashCode.multishorts 10000 avgt 5 4139.300 ? 10.633 ns/op > ArraysHashCode.shorts 10000 avgt 5 7844.019 ? 26.071 ns/op > > > After: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 10000 avgt 5 1193.208 ? 1.965 ns/op > ArraysHashCode.chars 10000 avgt 5 1193.311 ? 5.941 ns/op > ArraysHashCode.ints 10000 avgt 5 1132.592 ? 10.410 ns/op > ArraysHashCode.multibytes 10000 avgt 5 657.343 ? 25.343 ns/op > ArraysHashCode.multichars 10000 avgt 5 672.668 ? 5.229 ns/op > ArraysHashCode.multiints 10000 avgt 5 697.143 ? 3.929 ns/op > ArraysHashCode.multishorts 10000 avgt 5 666.738 ? 12.236 ns/op > ArraysHashCode.shorts 10000 avgt 5 1193.563 ? 5.449 ns/op > @cl4es There seem to be failure on windows-x64 platform pre submit tests. Could you please take a look? It looks like the `as_Address(ExternalAddress(StubRoutines::x86::arrays_hashcode_powers_of_31() + ...)` trick is running into some reachability issue on Windows, hitting the `assert(reachable(adr), "must be");` in `macroAssembler_x86.cpp`. Might be related to ASLR or some quirk of the VS compiler. I'll investigate. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Jan 9 15:00:48 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 9 Jan 2023 15:00:48 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v17] In-Reply-To: References: Message-ID: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 68 commits: - Merge branch 'master' into 8282664-polyhash - Treat Op_VectorizedHashCode as other similar Ops in split_unique_types - Handle signed subword arrays, contributed by @sviswa7 - @sviswa7 comments - Pass the constant mode node through, removing need for all but one instruct declarations - FLAG_SET_DEFAULT - Merge branch 'master' into 8282664-polyhash - Merge branch 'master' into 8282664-polyhash - Missing & 0xff in StringLatin1::hashCode - Qualified guess on shenandoahSupport fix-up - ... and 58 more: https://git.openjdk.org/jdk/compare/66db0bb6...71297615 ------------- Changes: https://git.openjdk.org/jdk/pull/10847/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=16 Stats: 1052 lines in 33 files changed: 992 ins; 8 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Jan 9 16:49:25 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 9 Jan 2023 16:49:25 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18] In-Reply-To: References: Message-ID: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Explicitly lea external address ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/71297615..c8c58f4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=16-17 Stats: 11 lines in 1 file changed: 6 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From wkemper at openjdk.org Mon Jan 9 17:41:37 2023 From: wkemper at openjdk.org (William Kemper) Date: Mon, 9 Jan 2023 17:41:37 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: <4-PjqKekTb5CuO6T_TRabJOAnG172XlsQqjsRSL16Rw=.811f7125-57ce-4551-94cf-945e2e3d939a@github.com> References: <4-PjqKekTb5CuO6T_TRabJOAnG172XlsQqjsRSL16Rw=.811f7125-57ce-4551-94cf-945e2e3d939a@github.com> Message-ID: On Sat, 7 Jan 2023 00:13:06 GMT, William Kemper wrote: > Merges tag jdk-21+4 This pull request has now been integrated. Changeset: bbd39940 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/bbd3994084271e4c2bca41987f9f6ab644bc754f Stats: 5313 lines in 488 files changed: 2252 ins; 1955 del; 1106 mod Merge openjdk/jdk:master ------------- PR: https://git.openjdk.org/shenandoah/pull/196 From kdnilsen at openjdk.org Mon Jan 9 22:18:26 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 9 Jan 2023 22:18:26 GMT Subject: RFR: Fix verification of remembered set at mark start Message-ID: All objects residing between TAMS and top() within each old region are examined independent of the marking context. ------------- Commit messages: - Fix verification of remembered set at mark start Changes: https://git.openjdk.org/shenandoah/pull/197/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=197&range=00 Stats: 46 lines in 1 file changed: 29 ins; 7 del; 10 mod Patch: https://git.openjdk.org/shenandoah/pull/197.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/197/head:pull/197 PR: https://git.openjdk.org/shenandoah/pull/197 From redestad at openjdk.org Mon Jan 9 23:17:00 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 9 Jan 2023 23:17:00 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18] In-Reply-To: References:

Message-ID: On Mon, 9 Jan 2023 23:13:29 GMT, Claes Redestad wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Explicitly lea external address > > Explicitly loading the address to a register seems to do the trick, avoiding the pitfalls of `as_Address(AddressLiteral)` - which apparently only works (portably) when we know for certain the address is in some allowed range. There's no measurable difference on microbenchmarks (there might be a couple of extra lea instructions on the vectorized paths, but that disappears in the noise). Thanks @fisk for the suggestion! Thanks @cl4es for fixing this issue. Changes look good to me. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From andrew at openjdk.org Tue Jan 10 01:50:25 2023 From: andrew at openjdk.org (Andrew John Hughes) Date: Tue, 10 Jan 2023 01:50:25 GMT Subject: git: openjdk/shenandoah-jdk8u: Added tag jdk8u332-b03 for changeset 12528bb4 Message-ID: <1273a435-5368-434f-bf7f-bfb8cabf183b@openjdk.org> Tagged by: Andrew John Hughes Date: 2022-02-23 01:58:09 +0000 Changeset: 12528bb4 Author: Sergey Bylokhov Date: 2022-02-16 21:06:29 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/12528bb4d331ed2ec9630db0ee3f2bfeea44b632 From andrew at openjdk.org Tue Jan 10 01:50:29 2023 From: andrew at openjdk.org (Andrew John Hughes) Date: Tue, 10 Jan 2023 01:50:29 GMT Subject: git: openjdk/shenandoah-jdk8u: Added tag shenandoah8u332-b03 for changeset 207cbfb2 Message-ID: <9e09187e-c3c9-4ac3-a52a-71bf5226d025@openjdk.org> Tagged by: Andrew John Hughes Date: 2023-01-10 01:48:05 +0000 Added tag shenandoah8u332-b03 for changeset 207cbfb2fce Changeset: 207cbfb2 Author: Andrew John Hughes Date: 2022-12-16 00:23:54 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/207cbfb2fce33f98095a9144546dfb8e2007483b From andrew at openjdk.org Tue Jan 10 01:50:51 2023 From: andrew at openjdk.org (Andrew John Hughes) Date: Tue, 10 Jan 2023 01:50:51 GMT Subject: git: openjdk/shenandoah-jdk8u: master: 5 new changesets Message-ID: Changeset: 26e70339 Author: Andrew John Hughes Date: 2022-02-08 16:47:38 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/26e70339509fc180a34714329a5e2e7c3750dbb5 Added tag jdk8u332-b02 for changeset 4eff168ecdd9 ! .hgtags Changeset: 054b85b1 Author: Erik Joelsson Date: 2018-09-07 14:54:15 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/054b85b1f65254b2d3d2a1d343e14d8eabd1af40 8210283: Support git as an SCM alternative in the build Removes forest handling of SCM ids Reviewed-by: andrew + .gitignore ! common/autoconf/basics.m4 ! common/autoconf/generated-configure.sh ! common/autoconf/spec.gmk.in ! make/common/MakeBase.gmk Changeset: 53bb5f63 Author: David Li Committer: David Li Date: 2014-04-15 10:36:23 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/53bb5f635cbf5eb46f687e275a4343862bdfc8db 8037259: xerces update: xpointer update Reviewed-by: lancea, phh ! jaxp/src/com/sun/org/apache/xerces/internal/xpointer/ElementSchemePointer.java ! jaxp/src/com/sun/org/apache/xerces/internal/xpointer/ShortHandPointer.java ! jaxp/src/com/sun/org/apache/xerces/internal/xpointer/XPointerErrorHandler.java ! jaxp/src/com/sun/org/apache/xerces/internal/xpointer/XPointerHandler.java ! jaxp/src/com/sun/org/apache/xerces/internal/xpointer/XPointerMessageFormatter.java ! jaxp/src/com/sun/org/apache/xerces/internal/xpointer/XPointerPart.java ! jaxp/src/com/sun/org/apache/xerces/internal/xpointer/XPointerProcessor.java Changeset: 12528bb4 Author: Sergey Bylokhov Date: 2022-02-16 21:06:29 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/12528bb4d331ed2ec9630db0ee3f2bfeea44b632 8280060: The sun/rmi/server/Activation.java class use Thread.dumpStack() Reviewed-by: phh ! jdk/src/share/classes/sun/rmi/server/Activation.java Changeset: 207cbfb2 Author: Andrew John Hughes Date: 2022-12-16 00:23:54 +0000 URL: https://git.openjdk.org/shenandoah-jdk8u/commit/207cbfb2fce33f98095a9144546dfb8e2007483b Merge jdk8u332-b03 From andrew at openjdk.org Tue Jan 10 01:52:50 2023 From: andrew at openjdk.org (Andrew John Hughes) Date: Tue, 10 Jan 2023 01:52:50 GMT Subject: RFR: Merge jdk8u:master [v2] In-Reply-To: <7AiNiAh1EJtQcsRIhBIiBESeY4i2w22rAotd7dky4-g=.0c73f373-3815-49c6-a2ae-9fda87352e53@github.com> References: <7AiNiAh1EJtQcsRIhBIiBESeY4i2w22rAotd7dky4-g=.0c73f373-3815-49c6-a2ae-9fda87352e53@github.com> Message-ID: > Mere jdk8u332-b03 Andrew John Hughes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/shenandoah-jdk8u/pull/8/files - new: https://git.openjdk.org/shenandoah-jdk8u/pull/8/files/207cbfb2..207cbfb2 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah-jdk8u&pr=8&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah-jdk8u&pr=8&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah-jdk8u/pull/8.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk8u pull/8/head:pull/8 PR: https://git.openjdk.org/shenandoah-jdk8u/pull/8 From iris at openjdk.org Tue Jan 10 01:52:51 2023 From: iris at openjdk.org (Iris Clark) Date: Tue, 10 Jan 2023 01:52:51 GMT Subject: Withdrawn: Merge jdk8u:master In-Reply-To: <7AiNiAh1EJtQcsRIhBIiBESeY4i2w22rAotd7dky4-g=.0c73f373-3815-49c6-a2ae-9fda87352e53@github.com> References: <7AiNiAh1EJtQcsRIhBIiBESeY4i2w22rAotd7dky4-g=.0c73f373-3815-49c6-a2ae-9fda87352e53@github.com> Message-ID: On Fri, 16 Dec 2022 00:32:06 GMT, Andrew John Hughes wrote: > Mere jdk8u332-b03 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/shenandoah-jdk8u/pull/8 From eosterlund at openjdk.org Tue Jan 10 10:12:51 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 10 Jan 2023 10:12:51 GMT Subject: RFR: 8299673: Simplify object pinning interactions with string deduplication Message-ID: When raw char* String contents are exposed to JNI code, we 1. Load the string.value and pin it 2. Run native code 3. Load the string.value and unpin it Given this sequence we would be in trouble if between 1 and 3, string deduplication changed the value object. Then the pinning and unpinning wouldn't be balanced. The current approach for dealing with this is to have a bunch of code to guard against deduplication. An alternative simpler solution is to just change step 3 to pass in the same value. We already have enough information available to do that. Then the pinning and unpinning is also balanced, and we don't need to have any special interactions with string deduplication and can decouple these orthogonal concerns. It's worth noting though that the contract of pin_object now makes it explicit that pinned objects must not be recycled, even if not otherwise reachable. That seems to come naturally for region based pinning, but is worth keeping in mind. The exposed char* might be the only thing referencing the string value when string dedup happens concurrently. ------------- Commit messages: - More Kim feedback - Feedback from Kim - 8299673: Simplify object pinning interactions with string deduplication Changes: https://git.openjdk.org/jdk/pull/11923/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11923&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8299673 Stats: 162 lines in 14 files changed: 66 ins; 68 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/11923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11923/head:pull/11923 PR: https://git.openjdk.org/jdk/pull/11923 From kdnilsen at openjdk.org Wed Jan 11 01:39:28 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 11 Jan 2023 01:39:28 GMT Subject: RFR: Fix verification of remembered set at mark start [v2] In-Reply-To: References: Message-ID: > All objects residing between TAMS and top() within each old region are examined independent of the marking context. Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Simplify rem-set verification code at init mark The code as originally written was mostly correct. Use that implementation with just a few refinements to properly handle promotions that occur during concurrent old-gen marking. - Simplify the fix to rem-set verifier Just remove the offending assert(). The code as originally written should work ok. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/197/files - new: https://git.openjdk.org/shenandoah/pull/197/files/e30a9aac..7a659985 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=197&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=197&range=00-01 Stats: 47 lines in 1 file changed: 8 ins; 29 del; 10 mod Patch: https://git.openjdk.org/shenandoah/pull/197.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/197/head:pull/197 PR: https://git.openjdk.org/shenandoah/pull/197 From kbarrett at openjdk.org Wed Jan 11 04:46:10 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 11 Jan 2023 04:46:10 GMT Subject: RFR: 8299673: Simplify object pinning interactions with string deduplication In-Reply-To: References: Message-ID: On Tue, 10 Jan 2023 10:04:48 GMT, Erik ?sterlund wrote: > When raw char* String contents are exposed to JNI code, we > > 1. Load the string.value and pin it > 2. Run native code > 3. Load the string.value and unpin it > > Given this sequence we would be in trouble if between 1 and 3, string deduplication changed the value object. Then the pinning and unpinning wouldn't be balanced. > > The current approach for dealing with this is to have a bunch of code to guard against deduplication. An alternative simpler solution is to just change step 3 to pass in the same value. We already have enough information available to do that. Then the pinning and unpinning is also balanced, and we don't need to have any special interactions with string deduplication and can decouple these orthogonal concerns. > > It's worth noting though that the contract of pin_object now makes it explicit that pinned objects must not be recycled, even if not otherwise reachable. That seems to come naturally for region based pinning, but is worth keeping in mind. The exposed char* might be the only thing referencing the string value when string dedup happens concurrently. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/11923 From stefank at openjdk.org Wed Jan 11 09:25:15 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 11 Jan 2023 09:25:15 GMT Subject: RFR: 8299673: Simplify object pinning interactions with string deduplication In-Reply-To: References: Message-ID: On Tue, 10 Jan 2023 10:04:48 GMT, Erik ?sterlund wrote: > When raw char* String contents are exposed to JNI code, we > > 1. Load the string.value and pin it > 2. Run native code > 3. Load the string.value and unpin it > > Given this sequence we would be in trouble if between 1 and 3, string deduplication changed the value object. Then the pinning and unpinning wouldn't be balanced. > > The current approach for dealing with this is to have a bunch of code to guard against deduplication. An alternative simpler solution is to just change step 3 to pass in the same value. We already have enough information available to do that. Then the pinning and unpinning is also balanced, and we don't need to have any special interactions with string deduplication and can decouple these orthogonal concerns. > > It's worth noting though that the contract of pin_object now makes it explicit that pinned objects must not be recycled, even if not otherwise reachable. That seems to come naturally for region based pinning, but is worth keeping in mind. The exposed char* might be the only thing referencing the string value when string dedup happens concurrently. Marked as reviewed by stefank (Reviewer). src/hotspot/share/gc/z/zCollectedHeap.cpp line 27: > 25: #include "classfile/classLoaderData.hpp" > 26: #include "gc/shared/gcLocker.inline.hpp" > 27: #include "gc/shared/gcHeapSummary.hpp" Sort order ------------- PR: https://git.openjdk.org/jdk/pull/11923 From redestad at openjdk.org Wed Jan 11 12:19:21 2023 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 11 Jan 2023 12:19:21 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18] In-Reply-To: References:

Message-ID: <3XcxGKxGGuk9z2Zz5qx32DcWsv5edlNMISuEw0lVawE=.fdc71f3d-ddee-485b-b6b5-c56ef6380368@github.com> On Mon, 9 Jan 2023 16:49:25 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly lea external address I'll do another round of internal testing (tier1-4). Unless I hear any objections I plan to integrate this once all testing looks satisfactory. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From kdnilsen at openjdk.org Wed Jan 11 15:03:06 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 11 Jan 2023 15:03:06 GMT Subject: RFR: Fix verification of remembered set at mark start [v2] In-Reply-To: References:

Message-ID: On Wed, 11 Jan 2023 01:39:28 GMT, Kelvin Nilsen wrote: >> All objects residing between TAMS and top() within each old region are examined independent of the marking context. > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - Simplify rem-set verification code at init mark > > The code as originally written was mostly correct. Use that > implementation with just a few refinements to properly handle promotions > that occur during concurrent old-gen marking. > - Simplify the fix to rem-set verifier > > Just remove the offending assert(). The code as originally written > should work ok. Marked as reviewed by ysr (Author). ------------- PR: https://git.openjdk.org/shenandoah/pull/197 From kdnilsen at openjdk.org Wed Jan 11 16:44:50 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 11 Jan 2023 16:44:50 GMT Subject: Integrated: Fix verification of remembered set at mark start In-Reply-To: References: Message-ID: On Mon, 9 Jan 2023 22:12:08 GMT, Kelvin Nilsen wrote: > All objects residing between TAMS and top() within each old region are examined independent of the marking context. This pull request has now been integrated. Changeset: aca12fcb Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/aca12fcb017524fc3107aa65e8f1566fc2e044fa Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Fix verification of remembered set at mark start Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/197 From kdnilsen at openjdk.org Wed Jan 11 16:48:19 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 11 Jan 2023 16:48:19 GMT Subject: RFR: Broaden plab region search Message-ID: Recent testing revealed that old-gen heap regions were being ignored in the search to satisfy new PLAB allocation requests. It was discovered that many of these regions are found within ranges of the ShenandoahFreeSet that are not considered to be "is_collector_free()". This is a first step in a two-step change. In this patch, we broaden the search for old-gen heap regions that have available memory to include regions that are not is_collector_free. In a second step of this improvement, we intend to restructure the implementation of the ShenandoahFreeSet to better distinguish ranges of regions that hold young-gen survivors, which ranges hold old-gen, and which ranges are intended to serve as mutator allocations. On an Extremem workload that allocates roughly 626 M/s in a total heap size of 49G with an old-gen usage of 18.7G, we saw significant improvements compared to mainline generational Shenandoah implementation: 1. Concurrent GC passes decreased from 606 to 493 (19% improvement) 2. Degenerated GC passes decreased from 17 to 3 (82% improvement) 3. Full GCs decreased from 15 to 3 (80% improvement) 4. P50 latency for Customer Preparation Processing (CPP) improved from 1798 us to 1735 us (3.5%) 5. P100 latency for CPP improved from 25_636_285 us to 9_148_580 us (64% improvement) Across a broad assortment of performance related CI tests, we also benefits on x86: -74.24% extremem-phased/do_nothing_p99 p=0.00061 Control: 2.318s (+/-941.25ms) 80 Test: 1.330s (+/- 1.01s ) 15 -15.70% extremem-phased/context_switch_count p=0.02032 Control: 28188.234 (+/-5868.23 ) 80 Test: 24362.538 (+/-4260.19 ) 15 -6.26% extremem-phased/do_nothing_p50 p=0.00246 Control: 603.203us (+/- 38.32us) 80 Test: 567.692us (+/- 50.34us) 15 And on aarch64: +22.92% specjbb2015/sla_10000_jops p=0.01104 Control: 2607.153 (+/-799.74 ) 90 Test: 3204.615 (+/-592.15 ) 15 -5.85% extremem-phased/do_nothing_p50 p=0.00675 Control: 608.153us (+/- 44.52us) 90 Test: 574.538us (+/- 47.49us) 15 ------------- Commit messages: - Fix white space - Remove instrumentation - Fix my fix limiting find-next-marked-object - Fix request to find next marked - Enhance log messages for generations at end of gc - Allow the search for old-gen PLAB to see regions not collector-free Changes: https://git.openjdk.org/shenandoah/pull/198/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=198&range=00 Stats: 66 lines in 6 files changed: 54 ins; 4 del; 8 mod Patch: https://git.openjdk.org/shenandoah/pull/198.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/198/head:pull/198 PR: https://git.openjdk.org/shenandoah/pull/198 From wkemper at openjdk.org Wed Jan 11 16:58:53 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 11 Jan 2023 16:58:53 GMT Subject: RFR: Broaden plab region search In-Reply-To: References: Message-ID: On Wed, 11 Jan 2023 16:33:09 GMT, Kelvin Nilsen wrote: > Recent testing revealed that old-gen heap regions were being ignored in the search to satisfy new PLAB allocation requests. It was discovered that many of these regions are found within ranges of the ShenandoahFreeSet that are not considered to be "is_collector_free()". > > This is a first step in a two-step change. In this patch, we broaden the search for old-gen heap regions that have available memory to include regions that are not is_collector_free. In a second step of this improvement, we intend to restructure the implementation of the ShenandoahFreeSet to better distinguish ranges of regions that hold young-gen survivors, which ranges hold old-gen, and which ranges are intended to serve as mutator allocations. > > On an Extremem workload that allocates roughly 626 M/s in a total heap size of 49G with an old-gen usage of 18.7G, we saw significant improvements compared to mainline generational Shenandoah implementation: > > 1. Concurrent GC passes decreased from 606 to 493 (19% improvement) > 2. Degenerated GC passes decreased from 17 to 3 (82% improvement) > 3. Full GCs decreased from 15 to 3 (80% improvement) > 4. P50 latency for Customer Preparation Processing (CPP) improved from 1798 us to 1735 us (3.5%) > 5. P100 latency for CPP improved from 25_636_285 us to 9_148_580 us (64% improvement) > > Across a broad assortment of performance related CI tests, we also benefits on x86: > > -74.24% extremem-phased/do_nothing_p99 p=0.00061 > Control: 2.318s (+/-941.25ms) 80 > Test: 1.330s (+/- 1.01s ) 15 > > -15.70% extremem-phased/context_switch_count p=0.02032 > Control: 28188.234 (+/-5868.23 ) 80 > Test: 24362.538 (+/-4260.19 ) 15 > > -6.26% extremem-phased/do_nothing_p50 p=0.00246 > Control: 603.203us (+/- 38.32us) 80 > Test: 567.692us (+/- 50.34us) 15 > > And on aarch64: > > +22.92% specjbb2015/sla_10000_jops p=0.01104 > Control: 2607.153 (+/-799.74 ) 90 > Test: 3204.615 (+/-592.15 ) 15 > > -5.85% extremem-phased/do_nothing_p50 p=0.00675 > Control: 608.153us (+/- 44.52us) 90 > Test: 574.538us (+/- 47.49us) 15 The workaround makes sense. Consider consolidating some of the log messages by reusing `ShenandoahGeneration::log_status`. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 245: > 243: heap->reset_old_evac_expended(); > 244: heap->set_promoted_reserve(0); > 245: log_info(gc, ergo)("At end of Concurrent GC, old_available: " SIZE_FORMAT "%s out of total: " SIZE_FORMAT "%s," This looks a lot like the implementation of `ShenandoahGeneration::log_status` . Could consolidate these messages and reduce logging duplicate information. src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 58: > 56: vmop_degenerated(); > 57: ShenandoahHeap* heap = ShenandoahHeap::heap(); > 58: if (heap->mode()->is_generational()) { As above, consider using `ShenandoahGeneration::log_status`. src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 179: > 177: size_t old_available = heap->old_generation()->available(); > 178: size_t young_available = heap->young_generation()->available(); > 179: log_info(gc, ergo)("At end of Full GC, old_available: " SIZE_FORMAT "%s out of total: " SIZE_FORMAT "%s," Consider `ShenandoahGeneration::log_status`. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 3194: > 3192: } else { > 3193: // This object is not live so we don't verify dirty cards contained therein > 3194: assert(tams != nullptr, "If object is not live, ctx and tams should be non-null"); Might need to rebase these changes after integrating #197 . ------------- Changes requested by wkemper (Committer). PR: https://git.openjdk.org/shenandoah/pull/198 From vlivanov at openjdk.org Wed Jan 11 19:09:36 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 11 Jan 2023 19:09:36 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18] In-Reply-To: References:

Message-ID: <_vJ-5zerpDnHng8O_QZ5LEfVb09knfCRIrWfHRB1eTQ=.f01389ce-82e9-4073-86e3-08b70219cf0b@github.com> On Mon, 9 Jan 2023 16:49:25 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly lea external address Before the patch goes in, I'd like to see a plan how the code will be refactored later. At the very least, I expect `is_string_hashcode`-related logic to go away and the intrinsic logic to be guided solely by a basic type of elements. If not in the initial version, then shortly after as a follow-up enhancement. Another thing I want to see is `VectorizedHashCode` node to go away and replaced with a stub call. ------------- Changes requested by vlivanov (Reviewer). PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Wed Jan 11 21:26:21 2023 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 11 Jan 2023 21:26:21 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18] In-Reply-To: References:

Message-ID: On Wed, 11 Jan 2023 16:43:31 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Unify GC heap status logging > > src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 179: > >> 177: size_t old_available = heap->old_generation()->available(); >> 178: size_t young_available = heap->young_generation()->available(); >> 179: log_info(gc, ergo)("At end of Full GC, old_available: " SIZE_FORMAT "%s out of total: " SIZE_FORMAT "%s," > > Consider `ShenandoahGeneration::log_status`. I've integrated the two heap-status logging approaches. Let me know what you think... Thanks for suggestion. > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 3194: > >> 3192: } else { >> 3193: // This object is not live so we don't verify dirty cards contained therein >> 3194: assert(tams != nullptr, "If object is not live, ctx and tams should be non-null"); > > Might need to rebase these changes after integrating #197 . So far, git seems to feel like there are "no conflicts". This code is identical to what I delivered in PR197. ------------- PR: https://git.openjdk.org/shenandoah/pull/198 From wkemper at openjdk.org Thu Jan 12 21:44:03 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 12 Jan 2023 21:44:03 GMT Subject: RFR: Broaden plab region search [v2] In-Reply-To: References:

Message-ID: On Thu, 12 Jan 2023 21:39:36 GMT, Kelvin Nilsen wrote: >> Recent testing revealed that old-gen heap regions were being ignored in the search to satisfy new PLAB allocation requests. It was discovered that many of these regions are found within ranges of the ShenandoahFreeSet that are not considered to be "is_collector_free()". >> >> This is a first step in a two-step change. In this patch, we broaden the search for old-gen heap regions that have available memory to include regions that are not is_collector_free. In a second step of this improvement, we intend to restructure the implementation of the ShenandoahFreeSet to better distinguish ranges of regions that hold young-gen survivors, which ranges hold old-gen, and which ranges are intended to serve as mutator allocations. >> >> On an Extremem workload that allocates roughly 626 M/s in a total heap size of 49G with an old-gen usage of 18.7G, we saw significant improvements compared to mainline generational Shenandoah implementation: >> >> 1. Concurrent GC passes decreased from 606 to 493 (19% improvement) >> 2. Degenerated GC passes decreased from 17 to 3 (82% improvement) >> 3. Full GCs decreased from 15 to 3 (80% improvement) >> 4. P50 latency for Customer Preparation Processing (CPP) improved from 1798 us to 1735 us (3.5%) >> 5. P100 latency for CPP improved from 25_636_285 us to 9_148_580 us (64% improvement) >> >> Across a broad assortment of performance related CI tests, we also benefits on x86: >> >> -74.24% extremem-phased/do_nothing_p99 p=0.00061 >> Control: 2.318s (+/-941.25ms) 80 >> Test: 1.330s (+/- 1.01s ) 15 >> >> -15.70% extremem-phased/context_switch_count p=0.02032 >> Control: 28188.234 (+/-5868.23 ) 80 >> Test: 24362.538 (+/-4260.19 ) 15 >> >> -6.26% extremem-phased/do_nothing_p50 p=0.00246 >> Control: 603.203us (+/- 38.32us) 80 >> Test: 567.692us (+/- 50.34us) 15 >> >> And on aarch64: >> >> +22.92% specjbb2015/sla_10000_jops p=0.01104 >> Control: 2607.153 (+/-799.74 ) 90 >> Test: 3204.615 (+/-592.15 ) 15 >> >> -5.85% extremem-phased/do_nothing_p50 p=0.00675 >> Control: 608.153us (+/- 44.52us) 90 >> Test: 574.538us (+/- 47.49us) 15 > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Unify GC heap status logging Thank you for making those changes to the logging. ------------- Marked as reviewed by wkemper (Committer). PR: https://git.openjdk.org/shenandoah/pull/198 From kdnilsen at openjdk.org Thu Jan 12 21:49:17 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 12 Jan 2023 21:49:17 GMT Subject: Integrated: Broaden plab region search In-Reply-To: References: Message-ID: <9ay8LOiRxOdMmjsqrkpQmlmSjf0xTDJIWCJyHNBTzks=.4893404d-9d88-488f-ab26-e3758006716b@github.com> On Wed, 11 Jan 2023 16:33:09 GMT, Kelvin Nilsen wrote: > Recent testing revealed that old-gen heap regions were being ignored in the search to satisfy new PLAB allocation requests. It was discovered that many of these regions are found within ranges of the ShenandoahFreeSet that are not considered to be "is_collector_free()". > > This is a first step in a two-step change. In this patch, we broaden the search for old-gen heap regions that have available memory to include regions that are not is_collector_free. In a second step of this improvement, we intend to restructure the implementation of the ShenandoahFreeSet to better distinguish ranges of regions that hold young-gen survivors, which ranges hold old-gen, and which ranges are intended to serve as mutator allocations. > > On an Extremem workload that allocates roughly 626 M/s in a total heap size of 49G with an old-gen usage of 18.7G, we saw significant improvements compared to mainline generational Shenandoah implementation: > > 1. Concurrent GC passes decreased from 606 to 493 (19% improvement) > 2. Degenerated GC passes decreased from 17 to 3 (82% improvement) > 3. Full GCs decreased from 15 to 3 (80% improvement) > 4. P50 latency for Customer Preparation Processing (CPP) improved from 1798 us to 1735 us (3.5%) > 5. P100 latency for CPP improved from 25_636_285 us to 9_148_580 us (64% improvement) > > Across a broad assortment of performance related CI tests, we also benefits on x86: > > -74.24% extremem-phased/do_nothing_p99 p=0.00061 > Control: 2.318s (+/-941.25ms) 80 > Test: 1.330s (+/- 1.01s ) 15 > > -15.70% extremem-phased/context_switch_count p=0.02032 > Control: 28188.234 (+/-5868.23 ) 80 > Test: 24362.538 (+/-4260.19 ) 15 > > -6.26% extremem-phased/do_nothing_p50 p=0.00246 > Control: 603.203us (+/- 38.32us) 80 > Test: 567.692us (+/- 50.34us) 15 > > And on aarch64: > > +22.92% specjbb2015/sla_10000_jops p=0.01104 > Control: 2607.153 (+/-799.74 ) 90 > Test: 3204.615 (+/-592.15 ) 15 > > -5.85% extremem-phased/do_nothing_p50 p=0.00675 > Control: 608.153us (+/- 44.52us) 90 > Test: 574.538us (+/- 47.49us) 15 This pull request has now been integrated. Changeset: 0be422bf Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/0be422bf31db81b640ab0911a327a65e5c56381a Stats: 97 lines in 11 files changed: 63 ins; 23 del; 11 mod Broaden plab region search Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/shenandoah/pull/198 From wkemper at openjdk.org Fri Jan 13 04:12:45 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 13 Jan 2023 04:12:45 GMT Subject: RFR: Use whole number of regions when resizing generations Message-ID: This avoids overflowing calculations when using a 32 bit word - it also simplifies some of the operations. All of the github actions are succeeding now. ------------- Commit messages: - Merge branch 'openjdk:master' into use-regions-for-sizing - Use region count rather than bytes count to avoid overflow with 32 bit words Changes: https://git.openjdk.org/shenandoah/pull/199/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=199&range=00 Stats: 76 lines in 2 files changed: 15 ins; 5 del; 56 mod Patch: https://git.openjdk.org/shenandoah/pull/199.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/199/head:pull/199 PR: https://git.openjdk.org/shenandoah/pull/199 From eosterlund at openjdk.org Fri Jan 13 12:52:06 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 13 Jan 2023 12:52:06 GMT Subject: RFR: 8299879: CollectedHeap hierarchy should use override Message-ID: The CollectedHeap class declares a bunch of virtual methods that are overridden in its subclasses. Now that we can, we should convert this code to use "override" consistently, to help making the code more robust. ------------- Commit messages: - 8299879: CollectedHeap hierarchy should use override Changes: https://git.openjdk.org/jdk/pull/11937/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11937&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8299879 Stats: 235 lines in 6 files changed: 2 ins; 5 del; 228 mod Patch: https://git.openjdk.org/jdk/pull/11937.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11937/head:pull/11937 PR: https://git.openjdk.org/jdk/pull/11937 From stefank at openjdk.org Fri Jan 13 12:52:07 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Jan 2023 12:52:07 GMT Subject: RFR: 8299879: CollectedHeap hierarchy should use override In-Reply-To: References: Message-ID: <27-qTaPTyaq4-REu5ZIwgRQBD6PzYXuyNBUciN2ytyE=.c1816ddf-a0f6-453d-af77-0e2ccce1230d@github.com> On Wed, 11 Jan 2023 09:05:44 GMT, Erik ?sterlund wrote: > The CollectedHeap class declares a bunch of virtual methods that are overridden in its subclasses. Now that we can, we should convert this code to use "override" consistently, to help making the code more robust. Marked as reviewed by stefank (Reviewer). Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11937 From tschatzl at openjdk.org Fri Jan 13 12:52:08 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 13 Jan 2023 12:52:08 GMT Subject: RFR: 8299879: CollectedHeap hierarchy should use override In-Reply-To: References: Message-ID: On Wed, 11 Jan 2023 09:05:44 GMT, Erik ?sterlund wrote: > The CollectedHeap class declares a bunch of virtual methods that are overridden in its subclasses. Now that we can, we should convert this code to use "override" consistently, to help making the code more robust. Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11937 From eosterlund at openjdk.org Fri Jan 13 12:52:09 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 13 Jan 2023 12:52:09 GMT Subject: RFR: 8299879: CollectedHeap hierarchy should use override In-Reply-To: <27-qTaPTyaq4-REu5ZIwgRQBD6PzYXuyNBUciN2ytyE=.c1816ddf-a0f6-453d-af77-0e2ccce1230d@github.com> References: <27-qTaPTyaq4-REu5ZIwgRQBD6PzYXuyNBUciN2ytyE=.c1816ddf-a0f6-453d-af77-0e2ccce1230d@github.com> Message-ID: On Thu, 12 Jan 2023 10:58:52 GMT, Stefan Karlsson wrote: >> The CollectedHeap class declares a bunch of virtual methods that are overridden in its subclasses. Now that we can, we should convert this code to use "override" consistently, to help making the code more robust. > > Marked as reviewed by stefank (Reviewer). Thank you for the reviews, @stefank and @tschatzl! ------------- PR: https://git.openjdk.org/jdk/pull/11937 From eosterlund at openjdk.org Fri Jan 13 16:22:17 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 13 Jan 2023 16:22:17 GMT Subject: RFR: 8299673: Simplify object pinning interactions with string deduplication In-Reply-To: References:

Message-ID: On Wed, 11 Jan 2023 04:43:54 GMT, Kim Barrett wrote: >> When raw char* String contents are exposed to JNI code, we >> >> 1. Load the string.value and pin it >> 2. Run native code >> 3. Load the string.value and unpin it >> >> Given this sequence we would be in trouble if between 1 and 3, string deduplication changed the value object. Then the pinning and unpinning wouldn't be balanced. >> >> The current approach for dealing with this is to have a bunch of code to guard against deduplication. An alternative simpler solution is to just change step 3 to pass in the same value. We already have enough information available to do that. Then the pinning and unpinning is also balanced, and we don't need to have any special interactions with string deduplication and can decouple these orthogonal concerns. >> >> It's worth noting though that the contract of pin_object now makes it explicit that pinned objects must not be recycled, even if not otherwise reachable. That seems to come naturally for region based pinning, but is worth keeping in mind. The exposed char* might be the only thing referencing the string value when string dedup happens concurrently. > > Looks good. Thank you for the reviews, @kimbarrett and @stefank! ------------- PR: https://git.openjdk.org/jdk/pull/11923 From kdnilsen at openjdk.org Fri Jan 13 17:35:50 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 13 Jan 2023 17:35:50 GMT Subject: RFR: Fix fullgc assertion Message-ID: Relax generation sizing assertions during Full GC because of the sequencing of operations that occur during Full GC. Enforce the assertion constraints at the end of Full GC. ------------- Commit messages: - Fix white space - Merge remote-tracking branch 'GitFarmBranch/fix-fullgc-assertion-error' into fix-fullgc-assertion - Fix assertion failure during Full GC Changes: https://git.openjdk.org/shenandoah/pull/200/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=200&range=00 Stats: 15 lines in 2 files changed: 13 ins; 0 del; 2 mod Patch: https://git.openjdk.org/shenandoah/pull/200.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/200/head:pull/200 PR: https://git.openjdk.org/shenandoah/pull/200 From ysr at openjdk.org Fri Jan 13 17:35:50 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 13 Jan 2023 17:35:50 GMT Subject: RFR: Fix fullgc assertion In-Reply-To: References: Message-ID: On Fri, 13 Jan 2023 16:28:03 GMT, Kelvin Nilsen wrote: > Relax generation sizing assertions during Full GC because of the sequencing of operations that occur during Full GC. > > Enforce the assertion constraints at the end of Full GC. Marked as reviewed by ysr (Author). ------------- PR: https://git.openjdk.org/shenandoah/pull/200 From kdnilsen at openjdk.org Fri Jan 13 17:54:20 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 13 Jan 2023 17:54:20 GMT Subject: RFR: Use whole number of regions when resizing generations In-Reply-To: References: Message-ID: On Fri, 13 Jan 2023 00:45:38 GMT, William Kemper wrote: > This avoids overflowing calculations when using a 32 bit word - it also simplifies some of the operations. All of the github actions are succeeding now. Thanks. ------------- Marked as reviewed by kdnilsen (Committer). PR: https://git.openjdk.org/shenandoah/pull/199 From wkemper at openjdk.org Fri Jan 13 17:55:16 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 13 Jan 2023 17:55:16 GMT Subject: RFR: Fix fullgc assertion In-Reply-To: References: Message-ID: On Fri, 13 Jan 2023 16:28:03 GMT, Kelvin Nilsen wrote: > Relax generation sizing assertions during Full GC because of the sequencing of operations that occur during Full GC. > > Enforce the assertion constraints at the end of Full GC. Marked as reviewed by wkemper (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/200 From kdnilsen at openjdk.org Fri Jan 13 17:59:24 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 13 Jan 2023 17:59:24 GMT Subject: Integrated: Fix fullgc assertion In-Reply-To: References: Message-ID: On Fri, 13 Jan 2023 16:28:03 GMT, Kelvin Nilsen wrote: > Relax generation sizing assertions during Full GC because of the sequencing of operations that occur during Full GC. > > Enforce the assertion constraints at the end of Full GC. This pull request has now been integrated. Changeset: 0e15cb6d Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/0e15cb6dfcd97a816f4213cd38ffdd5f402536b9 Stats: 15 lines in 2 files changed: 13 ins; 0 del; 2 mod Fix fullgc assertion Reviewed-by: ysr, wkemper ------------- PR: https://git.openjdk.org/shenandoah/pull/200 From wkemper at openjdk.org Fri Jan 13 18:29:22 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 13 Jan 2023 18:29:22 GMT Subject: Integrated: Use whole number of regions when resizing generations In-Reply-To: References: Message-ID: On Fri, 13 Jan 2023 00:45:38 GMT, William Kemper wrote: > This avoids overflowing calculations when using a 32 bit word - it also simplifies some of the operations. All of the github actions are succeeding now. This pull request has now been integrated. Changeset: ec3e5ef1 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/ec3e5ef150e1ff7b1e35a450653d8bf0bb1ee6c9 Stats: 76 lines in 2 files changed: 15 ins; 5 del; 56 mod Use whole number of regions when resizing generations Reviewed-by: kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/199 From wkemper at openjdk.org Fri Jan 13 22:47:24 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 13 Jan 2023 22:47:24 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: Merge tag jdk-21+5 ------------- Commit messages: - Merge tag 'jdk-21+5' into merge-jdk21-5 - Merge - 8299862: OfAddress setter should disallow heap segments - 8299849: Revert JDK-8294461: wrong effectively final determination by javac - 8299227: host `exif.org` not found in link in doc comment - 8299715: IR test: VectorGatherScatterTest.java fails with SVE randomly - 8294744: AArch64: applications/kitchensink/Kitchensink.java crashed: assert(oopDesc::is_oop(obj)) failed: not an oop - 8299733: AArch64: "unexpected literal addressing mode" assertion failure with -XX:+PrintC1Statistics - 8299693: Change to Xcode12.4+1.1 devkit for building on macOS at Oracle - 8300001: ProblemList test java/security/Policy/Root/Root.java - ... and 97 more: https://git.openjdk.org/shenandoah/compare/ec3e5ef1...06c44b37 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=shenandoah&pr=201&range=00.0 - openjdk/jdk:master: https://webrevs.openjdk.org/?repo=shenandoah&pr=201&range=00.1 Changes: https://git.openjdk.org/shenandoah/pull/201/files Stats: 7201 lines in 404 files changed: 4284 ins; 1621 del; 1296 mod Patch: https://git.openjdk.org/shenandoah/pull/201.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/201/head:pull/201 PR: https://git.openjdk.org/shenandoah/pull/201 From wkemper at openjdk.org Sat Jan 14 00:18:48 2023 From: wkemper at openjdk.org (William Kemper) Date: Sat, 14 Jan 2023 00:18:48 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Fri, 13 Jan 2023 22:39:56 GMT, William Kemper wrote: > Merge tag jdk-21+5 This pull request has now been integrated. Changeset: bfeccbdf Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/bfeccbdfcc57c9c98925eebec0d5ed965974cd93 Stats: 7201 lines in 404 files changed: 4284 ins; 1621 del; 1296 mod Merge openjdk/jdk:master ------------- PR: https://git.openjdk.org/shenandoah/pull/201 From redestad at openjdk.org Sun Jan 15 23:24:18 2023 From: redestad at openjdk.org (Claes Redestad) Date: Sun, 15 Jan 2023 23:24:18 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18] In-Reply-To: References:

Message-ID: On Mon, 9 Jan 2023 16:49:25 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly lea external address FWIW I prototyped a follow-up to use basic types and extracted the String special-casing from the code. To do so a few things unraveled, such as needing to pass the initial value, but arguably it all ended up a bit neater. I've put this experiment in another branch for now (https://github.com/openjdk/jdk/compare/pr/10847...cl4es:jdk:8282664-type-cleanup?expand=1) since I need to test it through thoroughly, but functionally and to ensure there's no obvious performance impact (did some quick sanity testing on micros that look perfectly neutral) @iwanowww does this make you a bit happier? I think of it as an immediate follow-up - but if there's strong preference I can merge it into this PR. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From rkennke at openjdk.org Mon Jan 16 09:26:10 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 16 Jan 2023 09:26:10 GMT Subject: RFR: 8300053: Shenandoah: Handle more GCCauses in ShenandoahControlThread::request_gc In-Reply-To: References: Message-ID: <62bujnK6t3dStlI1cJkfcnvkddm91PSBsf5rw36i6ME=.19ea2979-f448-46d9-8a20-9f05264c69da@github.com> On Thu, 12 Jan 2023 16:15:08 GMT, Aleksey Shipilev wrote: > $ CONF=linux-x86_64-server-fastdebug make test TEST=jdk/internal/vm/Continuation/BasicExt.java TEST_VM_OPTS="-XX:+UseShenandoahGC" > > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp:479), pid=406430, tid=406562 > # assert(GCCause::is_user_requested_gc(cause) || GCCause::is_serviceability_requested_gc(cause) || cause == GCCause::_metadata_GC_clear_soft_refs || cause == GCCause::_codecache_GC_aggressive || cause == GCCause::_codecache_GC_threshold || cause == GCCause::_full_gc_alot || cause == GCCause::_wb_full_gc || cause == GCCause::_wb_breakpoint || cause == GCCause::_scavenge_alot) failed: only requested GCs here: WhiteBox Initiated Young GC > # > ``` > > Added a missing cause into the assert. The test starts to pass. Looks good to me, thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.org/jdk/pull/11970 From shade at openjdk.org Mon Jan 16 09:32:15 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Jan 2023 09:32:15 GMT Subject: RFR: 8300053: Shenandoah: Handle more GCCauses in ShenandoahControlThread::request_gc In-Reply-To: References: Message-ID: <3DZXGR1saHnahcRv3W44iKXj4fU7Pw0pplbxopG2-vQ=.ddf46f05-b778-4d1e-a728-aebd36a1f809@github.com> On Thu, 12 Jan 2023 16:15:08 GMT, Aleksey Shipilev wrote: > $ CONF=linux-x86_64-server-fastdebug make test TEST=jdk/internal/vm/Continuation/BasicExt.java TEST_VM_OPTS="-XX:+UseShenandoahGC" > > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp:479), pid=406430, tid=406562 > # assert(GCCause::is_user_requested_gc(cause) || GCCause::is_serviceability_requested_gc(cause) || cause == GCCause::_metadata_GC_clear_soft_refs || cause == GCCause::_codecache_GC_aggressive || cause == GCCause::_codecache_GC_threshold || cause == GCCause::_full_gc_alot || cause == GCCause::_wb_full_gc || cause == GCCause::_wb_breakpoint || cause == GCCause::_scavenge_alot) failed: only requested GCs here: WhiteBox Initiated Young GC > # > ``` > > Added a missing cause into the assert. The test starts to pass. Thank you! ------------- PR: https://git.openjdk.org/jdk/pull/11970 From shade at openjdk.org Mon Jan 16 09:35:20 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Jan 2023 09:35:20 GMT Subject: Integrated: 8300053: Shenandoah: Handle more GCCauses in ShenandoahControlThread::request_gc In-Reply-To: References: Message-ID: On Thu, 12 Jan 2023 16:15:08 GMT, Aleksey Shipilev wrote: > $ CONF=linux-x86_64-server-fastdebug make test TEST=jdk/internal/vm/Continuation/BasicExt.java TEST_VM_OPTS="-XX:+UseShenandoahGC" > > # Internal Error (/home/shade/trunks/jdk/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp:479), pid=406430, tid=406562 > # assert(GCCause::is_user_requested_gc(cause) || GCCause::is_serviceability_requested_gc(cause) || cause == GCCause::_metadata_GC_clear_soft_refs || cause == GCCause::_codecache_GC_aggressive || cause == GCCause::_codecache_GC_threshold || cause == GCCause::_full_gc_alot || cause == GCCause::_wb_full_gc || cause == GCCause::_wb_breakpoint || cause == GCCause::_scavenge_alot) failed: only requested GCs here: WhiteBox Initiated Young GC > # > ``` > > Added a missing cause into the assert. The test starts to pass. This pull request has now been integrated. Changeset: cac72a60 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/cac72a60181d3570562f8534c691528d06e40cb8 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8300053: Shenandoah: Handle more GCCauses in ShenandoahControlThread::request_gc Reviewed-by: wkemper, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/11970 From eosterlund at openjdk.org Mon Jan 16 10:57:16 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 16 Jan 2023 10:57:16 GMT Subject: Integrated: 8299879: CollectedHeap hierarchy should use override In-Reply-To: References: Message-ID: <8qoRlJqwylrfMVs8E8Z23r9nM603_yyBTNsGphtC8Gw=.5b8d9e85-953f-48bc-876c-42dd782dfe48@github.com> On Wed, 11 Jan 2023 09:05:44 GMT, Erik ?sterlund wrote: > The CollectedHeap class declares a bunch of virtual methods that are overridden in its subclasses. Now that we can, we should convert this code to use "override" consistently, to help making the code more robust. This pull request has now been integrated. Changeset: a7342853 Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/a734285314a34ed61583132f2fc6be9d9c861af4 Stats: 235 lines in 6 files changed: 2 ins; 5 del; 228 mod 8299879: CollectedHeap hierarchy should use override Reviewed-by: stefank, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/11937 From eosterlund at openjdk.org Mon Jan 16 11:32:52 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 16 Jan 2023 11:32:52 GMT Subject: RFR: 8299673: Simplify object pinning interactions with string deduplication [v2] In-Reply-To: References: Message-ID: > When raw char* String contents are exposed to JNI code, we > > 1. Load the string.value and pin it > 2. Run native code > 3. Load the string.value and unpin it > > Given this sequence we would be in trouble if between 1 and 3, string deduplication changed the value object. Then the pinning and unpinning wouldn't be balanced. > > The current approach for dealing with this is to have a bunch of code to guard against deduplication. An alternative simpler solution is to just change step 3 to pass in the same value. We already have enough information available to do that. Then the pinning and unpinning is also balanced, and we don't need to have any special interactions with string deduplication and can decouple these orthogonal concerns. > > It's worth noting though that the contract of pin_object now makes it explicit that pinned objects must not be recycled, even if not otherwise reachable. That seems to come naturally for region based pinning, but is worth keeping in mind. The exposed char* might be the only thing referencing the string value when string dedup happens concurrently. Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Include sorting order - Merge branch 'master' into 8299673_pin_dedup - More Kim feedback - Feedback from Kim - 8299673: Simplify object pinning interactions with string deduplication ------------- Changes: https://git.openjdk.org/jdk/pull/11923/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11923&range=01 Stats: 153 lines in 14 files changed: 65 ins; 68 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/11923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11923/head:pull/11923 PR: https://git.openjdk.org/jdk/pull/11923 From smonteith at openjdk.org Mon Jan 16 21:48:24 2023 From: smonteith at openjdk.org (Stuart Monteith) Date: Mon, 16 Jan 2023 21:48:24 GMT Subject: RFR: 8298647: GenShen require heap size 2MB granularity Message-ID: Generational Shenandoah requires 2MB granularity in order for card tables to cover the allocated heap. Each byte in a page of card table represents 512 heap bytes. As card tables are allocated 4KB at a time, 4KB * 512 = 2MB. There is a circular dependency between the region calculations and the heap size calculations. This unconditionally rounds up the heap size to 2MB. It might be preferable to do this only when generational mode is enabled. Running with: java -Xlog:gc*=trace -XX:+UseShenandoahGC -mx495m \ -XX:ShenandoahGCMode=generational -version on a debug build is sufficient to reproduce this problem. ------------- Commit messages: - 8298647: GenShen require heap size 2MB granularity Changes: https://git.openjdk.org/shenandoah/pull/202/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=202&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8298647 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/202.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/202/head:pull/202 PR: https://git.openjdk.org/shenandoah/pull/202 From redestad at openjdk.org Mon Jan 16 23:19:49 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 16 Jan 2023 23:19:49 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v19] In-Reply-To: References: Message-ID: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with three additional commits since the last revision: - Change signature to offset + length, add sanity test - Adapt end input to len (fix latent bug with sub-ranges - Clean-up types, simplify, hoist special-casing of String variants from arrays_hashcode, add initial value and range to intrinsic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/c8c58f4a..59e179c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=17-18 Stats: 210 lines in 13 files changed: 41 ins; 61 del; 108 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Jan 16 23:28:37 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 16 Jan 2023 23:28:37 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v20] In-Reply-To: References: Message-ID: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: trailing ws ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/59e179c5..ffe5b66d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=18-19 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Jan 16 23:32:13 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 16 Jan 2023 23:32:13 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: References:

<6lAQI6kDDTGbskylHcWReX8ExaB6qkwgqoai7E6ikZY=.8a69a63c-453d-4bbd-8c76-4d477bfb77fe@github.com>

Message-ID: On Mon, 14 Nov 2022 18:28:53 GMT, Vladimir Ivanov wrote: >>> Also, I'd like to note that C2 auto-vectorization support is not too far away from being able to optimize hash code computations. At some point, I was able to achieve some promising results with modest tweaking of SuperWord pass: https://github.com/iwanowww/jdk/blob/superword/notes.txt http://cr.openjdk.java.net/~vlivanov/superword.reduction/webrev.00/ >> >> Intriguing. How far off is this - and do you think it'll be able to match the efficiency we see here with a memoized coefficient table etc? >> >> If we turn this intrinsic into a stub we might also be able to reuse the optimization in other places, including from within the VM (calculating String hashCodes happen in a couple of places, including String deduplication). So I think there are still a few compelling reasons to go the manual route and continue on this path. > >> How far off is this ...? > > Back then it looked way too constrained (tight constraints on code shapes). But I considered it as a generally applicable optimization. > >> ... do you think it'll be able to match the efficiency we see here with a memoized coefficient table etc? > > Yes, it is able to build the constant table at runtime when folding multiplications of constant coefficients produced during loop unrolling and then packing scalars into a constant vector. > > Moreover, briefly looking at the code shape, the vectorizer would produce a more optimal loop shape (pre-loop would align vector accesses and would use 512-bit vectors when available; vector post-loop could help as well). I've opted to include the changes spurred by @iwanowww's comments since it led to a number of revisions to the intrinsified method API, and it would be strange to introduce an intrinsified method just to change the API drastically in a follow-up. Basically `ArraysSupport.vectorizedHashCode` has been changed to take an offset + length, an initial value and the logical basic type of the array elements. Which means any necessary scaling of index and length needs to be taken care of before calling the intrinsic. This makes the implementation more flexible at no measurable performance cost. Overall the refactoring might have reduced complexity a bit. Reviewers might observe that nothing is currently passing anything but `0` and `length` to `vectorizedHashCode` outside of the simple sanity test I've added, but I've verified this feature can be used to some effect elsewhere in the JDK, e.g: https://github.com/openjdk/jdk/compare/pr/10847...cl4es:jdk:zipcoder-hashcode?expand=1 (which improves speed of opening `ZipFile` by a small percentage in microbenchmarks). ------------- PR: https://git.openjdk.org/jdk/pull/10847 From dholmes at openjdk.org Tue Jan 17 02:05:11 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 17 Jan 2023 02:05:11 GMT Subject: RFR: 8299673: Simplify object pinning interactions with string deduplication [v2] In-Reply-To: References:

Message-ID: On Mon, 16 Jan 2023 11:32:52 GMT, Erik ?sterlund wrote: >> When raw char* String contents are exposed to JNI code, we >> >> 1. Load the string.value and pin it >> 2. Run native code >> 3. Load the string.value and unpin it >> >> Given this sequence we would be in trouble if between 1 and 3, string deduplication changed the value object. Then the pinning and unpinning wouldn't be balanced. >> >> The current approach for dealing with this is to have a bunch of code to guard against deduplication. An alternative simpler solution is to just change step 3 to pass in the same value. We already have enough information available to do that. Then the pinning and unpinning is also balanced, and we don't need to have any special interactions with string deduplication and can decouple these orthogonal concerns. >> >> It's worth noting though that the contract of pin_object now makes it explicit that pinned objects must not be recycled, even if not otherwise reachable. That seems to come naturally for region based pinning, but is worth keeping in mind. The exposed char* might be the only thing referencing the string value when string dedup happens concurrently. > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Include sorting order > - Merge branch 'master' into 8299673_pin_dedup > - More Kim feedback > - Feedback from Kim > - 8299673: Simplify object pinning interactions with string deduplication Initially I was a bit unsure about the conceptual model here, as I was thinking that pinning is a very general concept, where in fact it only relates to these JNI "critical" functions. So in that sense every GC must support pinning as required by those functions, so this simplification looks very neat. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/11923 From eosterlund at openjdk.org Tue Jan 17 07:58:12 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 17 Jan 2023 07:58:12 GMT Subject: RFR: 8299673: Simplify object pinning interactions with string deduplication [v2] In-Reply-To: References:

Message-ID: On Tue, 17 Jan 2023 02:02:47 GMT, David Holmes wrote: >> Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Include sorting order >> - Merge branch 'master' into 8299673_pin_dedup >> - More Kim feedback >> - Feedback from Kim >> - 8299673: Simplify object pinning interactions with string deduplication > > Initially I was a bit unsure about the conceptual model here, as I was thinking that pinning is a very general concept, where in fact it only relates to these JNI "critical" functions. So in that sense every GC must support pinning as required by those functions, so this simplification looks very neat. Thanks. Thanks for the review, @dholmes-ora! ------------- PR: https://git.openjdk.org/jdk/pull/11923 From eosterlund at openjdk.org Tue Jan 17 08:04:16 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 17 Jan 2023 08:04:16 GMT Subject: Integrated: 8299673: Simplify object pinning interactions with string deduplication In-Reply-To: References: Message-ID: On Tue, 10 Jan 2023 10:04:48 GMT, Erik ?sterlund wrote: > When raw char* String contents are exposed to JNI code, we > > 1. Load the string.value and pin it > 2. Run native code > 3. Load the string.value and unpin it > > Given this sequence we would be in trouble if between 1 and 3, string deduplication changed the value object. Then the pinning and unpinning wouldn't be balanced. > > The current approach for dealing with this is to have a bunch of code to guard against deduplication. An alternative simpler solution is to just change step 3 to pass in the same value. We already have enough information available to do that. Then the pinning and unpinning is also balanced, and we don't need to have any special interactions with string deduplication and can decouple these orthogonal concerns. > > It's worth noting though that the contract of pin_object now makes it explicit that pinned objects must not be recycled, even if not otherwise reachable. That seems to come naturally for region based pinning, but is worth keeping in mind. The exposed char* might be the only thing referencing the string value when string dedup happens concurrently. This pull request has now been integrated. Changeset: 9a36f8aa Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/9a36f8aadb08f8ade578530c70d9abe38f1826c6 Stats: 153 lines in 14 files changed: 65 ins; 68 del; 20 mod 8299673: Simplify object pinning interactions with string deduplication Co-authored-by: Stefan Karlsson Co-authored-by: Axel Boldt-Christmas Reviewed-by: kbarrett, stefank, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/11923 From vlivanov at openjdk.org Tue Jan 17 18:59:51 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 17 Jan 2023 18:59:51 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v20] In-Reply-To: References:

Message-ID: On Mon, 16 Jan 2023 23:28:37 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > trailing ws Thanks, Claes. Looks good. Please, file an RFE for the follow-up work. src/hotspot/share/opto/machnode.cpp line 211: > 209: opcnt++; // Bump operand count > 210: assert( opcnt < numopnds, "Accessing non-existent operand" ); > 211: A leftover from a previous change? src/java.base/share/classes/jdk/internal/util/ArraysSupport.java line 168: > 166: // See https://docs.oracle.com/javase/specs/jvms/se9/html/jvms-6.html#jvms-6.5.newarray. > 167: > 168: public static final int T_BOOLEAN = 4; As an idea for a follow-up enhancement, unless there are plans to implement runtime dispatching between different stubs, the basic type can be coded as a Class and on compiler side the corresponding basic type extracted with `java_lang_Class::as_BasicType()`. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.org/jdk/pull/10847 From wkemper at openjdk.org Tue Jan 17 19:42:39 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 17 Jan 2023 19:42:39 GMT Subject: RFR: Do not reset learning cycles after resizing Message-ID: Also, require more than 10 gc cycles before trigger can resize generations. The behavior on the extremem 'phased' workload looks much better: [1334.381s][info][gc,stats ] 66 Successful Concurrent GCs [1334.381s][info][gc,stats ] 0 invoked explicitly [1334.381s][info][gc,stats ] 0 invoked implicitly [1334.381s][info][gc,stats ] [1334.381s][info][gc,stats ] 7 Completed Old GCs [1334.381s][info][gc,stats ] 0 mixed [1334.381s][info][gc,stats ] 0 interruptions [1334.381s][info][gc,stats ] [1334.381s][info][gc,stats ] 1 Degenerated GCs [1334.381s][info][gc,stats ] 1 caused by allocation failure [1334.381s][info][gc,stats ] 1 happened at Mark [1334.381s][info][gc,stats ] 1 upgraded to Full GC [1334.381s][info][gc,stats ] [1334.381s][info][gc,stats ] 0 Abbreviated GCs [1334.381s][info][gc,stats ] [1334.381s][info][gc,stats ] 1 Full GCs [1334.381s][info][gc,stats ] 0 invoked explicitly [1334.381s][info][gc,stats ] 0 invoked implicitly [1334.381s][info][gc,stats ] 0 caused by allocation failure [1334.381s][info][gc,stats ] 1 upgraded from Degenerated GC The full cycle here was the first cycle after the last of the initial learning cycles. ------------- Commit messages: - Require more than 10 gc cycles before trigger can resize generations - Merge branch 'shenandoah-master' into generation-sizing-refinements - Do not reset learning cycles after resizing Changes: https://git.openjdk.org/shenandoah/pull/203/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=203&range=00 Stats: 22 lines in 3 files changed: 16 ins; 3 del; 3 mod Patch: https://git.openjdk.org/shenandoah/pull/203.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/203/head:pull/203 PR: https://git.openjdk.org/shenandoah/pull/203 From wkemper at openjdk.org Tue Jan 17 20:14:13 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 17 Jan 2023 20:14:13 GMT Subject: RFR: 8298647: GenShen require heap size 2MB granularity In-Reply-To: References: Message-ID: On Mon, 16 Jan 2023 21:41:35 GMT, Stuart Monteith wrote: > Generational Shenandoah requires 2MB granularity in order for card tables to cover the allocated heap. Each byte in a page of card table represents 512 heap bytes. As card tables are allocated 4KB at a time, 4KB * 512 = 2MB. > > There is a circular dependency between the region calculations and the heap size calculations. This unconditionally rounds up the heap size to 2MB. It might be preferable to do this only when generational mode is enabled. > > Running with: > java -Xlog:gc*=trace -XX:+UseShenandoahGC -mx495m \ > -XX:ShenandoahGCMode=generational -version > > on a debug build is sufficient to reproduce this problem. Changes requested by wkemper (Committer). src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 728: > 726: } > 727: > 728: // Generational Shenandoah needs this alignment for card tables. Thank you for this fix! It would be nice if this constraint were only applied for generation mode, but these sizes are computed quite earlier during startup. You'd need to factor the code out of `ShenandoahHeap::initialize_heuristics` to know whether the constraint is required at this point. ------------- PR: https://git.openjdk.org/shenandoah/pull/202 From redestad at openjdk.org Tue Jan 17 20:55:08 2023 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 17 Jan 2023 20:55:08 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v21] In-Reply-To: References: Message-ID: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Remove spurious newline ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/ffe5b66d..48c068bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=19-20 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Tue Jan 17 20:55:12 2023 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 17 Jan 2023 20:55:12 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v20] In-Reply-To: References: