From zgu at openjdk.org Tue Apr 1 00:22:22 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Tue, 1 Apr 2025 00:22:22 GMT Subject: RFR: 8353329: Small memory leak when create GrowableArray with initial size 0 Message-ID: <8AYi7nNaVzabbCR4r8fRTK7_rPXo4JgVUI6dKOXaxiQ=.289463ca-a4c5-4966-b14b-1039ad51cc32@github.com> Please review this small fix to avoid 1 byte leak when create a GrowableArray with initial size = 0. GrowableArray's c-heap allocator does not check size = 0 and os:malloc(0) will malloc at least 1 byte. ------------- Commit messages: - 8353329: Small memory leak when create GrowableArray with initial size 0 Changes: https://git.openjdk.org/jdk/pull/24341/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24341&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353329 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24341.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24341/head:pull/24341 PR: https://git.openjdk.org/jdk/pull/24341 From ccheung at openjdk.org Tue Apr 1 01:31:23 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 1 Apr 2025 01:31:23 GMT Subject: RFR: 8353325: Rewrite appcds/methodHandles test cases to use CDSAppTester In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 23:08:13 GMT, Ioi Lam wrote: > These test cases are rewritten to use CDSAppTester, so that they can also be executed in the new JEP 483 workflow (with `-XX:AOTCache=xxx`, etc). This will increase coverage of current and upcoming AOT features (such as AOT linking of invokedynamic, and AOT method profiling). > > These test cases are generated by a bash script. This PR minimizes the generated part so that the main portions of the tests can be modified as a normal java source file. Just one nit. Which tiers testing have been run with this change? test/hotspot/jtreg/runtime/cds/appcds/methodHandles/JDKMethodHandlesTestRunner.java line 40: > (failed to retrieve contents of file, check the PR for context) Pre-existing: Can you also remove the comment `System.out.println` at line 138? ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24340#pullrequestreview-2730848982 PR Review Comment: https://git.openjdk.org/jdk/pull/24340#discussion_r2021985035 From liach at openjdk.org Tue Apr 1 01:59:21 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 1 Apr 2025 01:59:21 GMT Subject: RFR: 8352565: Add native method implementation of Reference.get() In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 21:47:18 GMT, Kim Barrett wrote: > Please review this change which adds a native method providing the > implementation of Reference::get. Referece::get is an intrinsic candidate, so > this native method implementation is only used when the intrinsic is not. > > Currently there is intrinsic support by the interpreter, C1, C2, and graal, > which are always used. With this change we can later remove all the > per-platform interpreter intrinsic implementations, and might also remove the > C1 intrinsic implementation. > > Testing: > (1) mach5 tier1-6 normal (so using all the existing intrinsics). > (2) mach5 tier1-6 with interpreter and C1 Reference::get intrinsics disabled. src/java.base/share/classes/java/lang/ref/Reference.java line 365: > 363: * C2 to sometimes prefer the native implementation over the intrinsic. > 364: */ > 365: private native Object get0(); I think you can declare this as `private native T get0();` without changes to native method signatures, so you can avoid the unchecked cast above. (See Class::getPrimitiveClass declaration) Also, can C2 choose to use native over intrinsic? That is concerning from a performance POV, as I think there are a few such performance sensitive methods in core libraries. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24315#discussion_r2022010819 From asmehra at openjdk.org Tue Apr 1 04:40:47 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 1 Apr 2025 04:40:47 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 21:32:31 GMT, Thomas Fitzsimmons wrote: >> src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 81: >> >>> 79: // file system magic. If it does not then heuristics are required to determine >>> 80: // if cgroups v1 is usable or not. >>> 81: if (statfs(sys_fs_cgroup, &fsstat) != -1) { >> >> I feel this logic should be moved to `determine_type` as it is responsible for determining the version of the cgroup subsystem. > > OK, I tend to agree; I will investigate alternatives. I did consider putting the `statfs` logic inside but ended up leaving it outside because `determine_type` is called by the `whitebox` framework, and "mocking" `statfs` is not possible with regular files. The idea is to allow the test suite to simply mock the `statfs` result via the boolean `cgroups_v2_enabled` argument. One option is to pass an argument to `determine_type` to indicate it is being called from the test suite and skip the call to `statfs` in such case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r2022130118 From dholmes at openjdk.org Tue Apr 1 05:23:25 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 1 Apr 2025 05:23:25 GMT Subject: RFR: 8353129: CDS ArchiveRelocation tests fail after JDK-8325132 [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 22:15:39 GMT, Calvin Cheung wrote: >> Two archive relocation tests failed when `-XX:ArchiveRelocationMode=0` is specified via the jtreg `-javaoption`. >> A fix is to add a `WhiteBox.getArchiveRelocationMode()` method so that the tests can check if the `ArchiveRelocationMode` is set to 0 before checking the expected output. >> >> Passed tiers 1 - 4 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > simplify the fix per David's suggestion LGTM2. I hope it now passes tier4. Thanks src/hotspot/share/prims/whitebox.cpp line 2136: > 2134: WB_ENTRY(jint, WB_GetArchiveRelocationMode(JNIEnv* env, jobject wb)) > 2135: #if INCLUDE_CDS > 2136: return (jint)ArchiveRelocationMode; Nit: do we need casts between int and jint ?? ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24308#pullrequestreview-2728076618 PR Review Comment: https://git.openjdk.org/jdk/pull/24308#discussion_r2020303125 From asmehra at openjdk.org Tue Apr 1 07:06:16 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 1 Apr 2025 07:06:16 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 21:26:37 GMT, Thomas Fitzsimmons wrote: >> test/hotspot/jtreg/containers/cgroup/CgroupSubsystemFactory.java line 459: >> >>> 457: public void testCgroupv1SystemdOnly(WhiteBox wb) { >>> 458: String procCgroups = cgroupv1CgInfoZeroHierarchy.toString(); >>> 459: String procSelfCgroup = cgroupV2SelfCgroup.toString(); >> >> I don't get why is this change required? The test name `testCgroupv1SystemdOnly` suggests it is testing cgroup v1 only but then it passes cgroup v2 proc file. Same for `testCgroupv1NoMounts`. > > Thank you for reviewing. This test consistency fix is discussed [here](https://github.com/openjdk/jdk/pull/23811#discussion_r1973877201) and [here](https://github.com/openjdk/jdk/pull/23811#discussion_r1978045429); I agree the result is confusing. Instead I will change `cgroupv1CgInfoZeroHierarchy` to `cgroupv1CgInfoNonZeroHierarchy` which achieves the same effect using only `cgroup v1` fields. Yeah that would be better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r2022267066 From aboldtch at openjdk.org Tue Apr 1 07:14:55 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 1 Apr 2025 07:14:55 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v16] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 14:47:31 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. >> >> Two key changes enable this feature: >> 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. >> 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. >> >> >> Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. >> >> An example of how you could use the intrusive tree is found below: >> >> ```c++ >> struct MyIntrusiveStructure { >> Node node; // The tree node is part of an external structure >> int data; >> >> MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} >> Node* get_node() { return &node; } >> static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } >> }; >> >> Tree my_intrusive_tree; >> >> Cursor insert_cursor = my_intrusive_tree.cursor_find(0); >> Node insert_node = Node(0); >> >> // Custom allocation here is just malloc >> MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); >> new (place) MyIntrusiveStructure(0, insert_node); >> >> my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); >> >> Cursor find_cursor = my_intrusive_tree.cursor_find(0); >> int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; >> >> >> >> Please let me know any feedback or concerns! > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > axel feedback lgtm. Thanks! src/hotspot/share/utilities/rbTree.inline.hpp line 621: > 619: assert_leq(from, start); > 620: assert_geq(to, start); > 621: } Not sure if we should add an else branch here where we assert end == nullptr / end == start. But given that we will more than likely just crash when reading `start->next()`, it does not matter to much. Regardless of any assert, a bad interval will crash early. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23416#pullrequestreview-2731673047 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r2022273261 From dholmes at openjdk.org Tue Apr 1 07:22:26 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 1 Apr 2025 07:22:26 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 13:49:46 GMT, Magnus Ihse Bursie wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> address Windows issues > > How problematic would it be to read it on demand? Is it just that there is a risk that it won't work, or could it cause the crash dumping process to fail completely? @magicus like so many things in the crash reporting process, when in a signal handling context, it could lead to a secondary fault, or it could deadlock, or it might work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2768418285 From jsjolen at openjdk.org Tue Apr 1 07:24:33 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 1 Apr 2025 07:24:33 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v16] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 14:47:31 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. >> >> Two key changes enable this feature: >> 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. >> 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. >> >> >> Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. >> >> An example of how you could use the intrusive tree is found below: >> >> ```c++ >> struct MyIntrusiveStructure { >> Node node; // The tree node is part of an external structure >> int data; >> >> MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} >> Node* get_node() { return &node; } >> static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } >> }; >> >> Tree my_intrusive_tree; >> >> Cursor insert_cursor = my_intrusive_tree.cursor_find(0); >> Node insert_node = Node(0); >> >> // Custom allocation here is just malloc >> MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); >> new (place) MyIntrusiveStructure(0, insert_node); >> >> my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); >> >> Cursor find_cursor = my_intrusive_tree.cursor_find(0); >> int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; >> >> >> >> Please let me know any feedback or concerns! > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > axel feedback >ACTION: main -- Error. Program `/home/runner/work/jdk/jdk/bundles/jdk/jdk-25/fastdebug/bin/java' timed out (timeout set to 480000ms, elapsed time including timeout handling was 520263ms). REASON: User specified action: run main/othervm -Xmx1g -Xms1g -Xlog:gc -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -XX:ShenandoahTargetNumRegions=2048 -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=adaptive -XX:ShenandoahGCMode=generational -XX:+ShenandoahVerify TestAllocHumongousFragment I think it's safe to say that this crash is unrelated to your changes. Still LGTM. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23416#pullrequestreview-2731718458 From jsjolen at openjdk.org Tue Apr 1 07:26:14 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 1 Apr 2025 07:26:14 GMT Subject: RFR: 8353329: Small memory leak when create GrowableArray with initial size 0 In-Reply-To: <8AYi7nNaVzabbCR4r8fRTK7_rPXo4JgVUI6dKOXaxiQ=.289463ca-a4c5-4966-b14b-1039ad51cc32@github.com> References: <8AYi7nNaVzabbCR4r8fRTK7_rPXo4JgVUI6dKOXaxiQ=.289463ca-a4c5-4966-b14b-1039ad51cc32@github.com> Message-ID: On Tue, 1 Apr 2025 00:18:07 GMT, Zhengyu Gu wrote: > Please review this small fix to avoid 1 byte leak when create a GrowableArray with initial size = 0. > > GrowableArray's c-heap allocator does not check size = 0 and os:malloc(0) will malloc at least 1 byte. Marked as reviewed by jsjolen (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24341#pullrequestreview-2731723994 From dholmes at openjdk.org Tue Apr 1 07:32:32 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 1 Apr 2025 07:32:32 GMT Subject: RFR: 8353117: Crash: assert(id >= ThreadIdentifier::initial() && id < ThreadIdentifier::current()) failed: must be reasonable) In-Reply-To: <6ludi2j0fKqL5_MirvViyefGITPvMKzAIX8EJIhfbFE=.3935e8bc-e9a3-4ed1-8da6-e943c97714b6@github.com> References: <6ludi2j0fKqL5_MirvViyefGITPvMKzAIX8EJIhfbFE=.3935e8bc-e9a3-4ed1-8da6-e943c97714b6@github.com> Message-ID: On Mon, 31 Mar 2025 18:15:39 GMT, Patricio Chilano Mateo wrote: > Please review the following fix. For the attaching thread case we are incorrectly setting the `_monitor_owner_id` after `Threads::add()` is called, i.e after the attaching thread becomes visible through a ThreadsListHandle. So if another thread calls `Threads::owning_thread_from_monitor()` in between these events and iterates through all JavaThreads looking for the owner of a given monitor, we might find this attaching thread still with a `_monitor_owner_id` of 0. > I corrected the ordering and improved verification checks. Tested in mach5 tiers1-5. > > Thanks, > Patricio That seems fine to me. Thanks for fixing. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24336#pullrequestreview-2731738289 From epeter at openjdk.org Tue Apr 1 08:25:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 08:25:21 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument Looks good to me now, thanks for the updates! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24248#pullrequestreview-2731884935 From chagedorn at openjdk.org Tue Apr 1 08:30:34 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 08:30:34 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24248#pullrequestreview-2731900566 From azafari at openjdk.org Tue Apr 1 08:30:36 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 1 Apr 2025 08:30:36 GMT Subject: RFR: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag [v5] In-Reply-To: <0SlK7ixxGv5N7-LQnC7SwgpcK4Oz_9_H24qnrGPrTpc=.9bfd6434-6a48-4563-9dd6-66cff70dafe7@github.com> References: <0SlK7ixxGv5N7-LQnC7SwgpcK4Oz_9_H24qnrGPrTpc=.9bfd6434-6a48-4563-9dd6-66cff70dafe7@github.com> Message-ID: On Fri, 7 Mar 2025 16:06:32 GMT, Afshin Zafari wrote: >> With the `size` parameter there will be no need to traverse/go through the nodes between the base and end of the region. >> Tests: >> linux-x64-debug, gtest:NMT* and runtime/NMT* > > Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge remote-tracking branch 'origin/master' into _8350566_size_par_set_tag > - new fix. > - fixed build problem. > - ReservedSpace is accepted as param. > - applied also to VMT. > - 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag GHA failures are not relevant. Thanks for reviews @jdksjolen and @gerard-ziemski. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23770#issuecomment-2768577688 From azafari at openjdk.org Tue Apr 1 08:30:36 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 1 Apr 2025 08:30:36 GMT Subject: Integrated: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 09:49:41 GMT, Afshin Zafari wrote: > With the `size` parameter there will be no need to traverse/go through the nodes between the base and end of the region. > Tests: > linux-x64-debug, gtest:NMT* and runtime/NMT* This pull request has now been integrated. Changeset: aff5aa72 Author: Afshin Zafari URL: https://git.openjdk.org/jdk/commit/aff5aa72bbf4ecea614339483581093a67efa265 Stats: 27 lines in 14 files changed: 6 ins; 1 del; 20 mod 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag Reviewed-by: jsjolen, gziemski ------------- PR: https://git.openjdk.org/jdk/pull/23770 From sgehwolf at openjdk.org Tue Apr 1 08:45:15 2025 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 1 Apr 2025 08:45:15 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 04:38:00 GMT, Ashutosh Mehra wrote: >> OK, I tend to agree; I will investigate alternatives. I did consider putting the `statfs` logic inside but ended up leaving it outside because `determine_type` is called by the `whitebox` framework, and "mocking" `statfs` is not possible with regular files. The idea is to allow the test suite to simply mock the `statfs` result via the boolean `cgroups_v2_enabled` argument. > > One option is to pass an argument to `determine_type` to indicate it is being called from the test suite and skip the call to `statfs` in such case. If we really must, I'd rather have a function pointer for the statfs call which we can replace in test code. It doesn't seem worth the extra complexity in my opinion though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r2022418397 From ihse at openjdk.org Tue Apr 1 08:48:08 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 1 Apr 2025 08:48:08 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 00:58:59 GMT, Vladimir Ivanov wrote: > Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. > > It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. > > PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. > > Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. > > Testing: hs-tier1 - hs-tier4, microbenchmarks The rule that has dictated placement of the sources is where it is actually used in the JDK. If upstream spleef is cross-platform, or if the generated code is platform independent is strictly speaking irrelevant, if it is only used in our linux builds. Unless you are like 95% sure you are going to use libsleef on Windows, I still think it should be put in unix rather than share. Moving it once again is not that much of a hassle. In contrast, if we in general allowed ourselves to not keep the source code based on what we do, but "just as a precaution if we are going to do stuff in the future", it would be much harder to reason about the code. This is a sort of "tragedy of the commons" -- every single piece of code might think that "this extra but unnecessary generalization helps me a bit and does not hurt much", but if you let that sentiment guide your code it quickly becomes much harder to maintain than necessary. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2768632094 From ihse at openjdk.org Tue Apr 1 08:57:02 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 1 Apr 2025 08:57:02 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 21:53:21 GMT, Vladimir Ivanov wrote: > > That commit assumes that vector_math_sve.c should have $(SVE_CFLAGS) on mac as well as on linux. If that is not correct, then it needs to be adjusted. > > As of now, Apple Silicon doesn't support SVE/SVE2, so I intentionally excluded SVE support on macosx-aarch64. What would be the best way to exclude `vector_math_sve.c` on macosx-aarch64? The best way would be to make sure SVE_CFLAGS is empty on macosx. diff --git a/make/autoconf/flags-cflags.m4 b/make/autoconf/flags-cflags.m4 index 73786587735..6e5a70a43a5 100644 --- a/make/autoconf/flags-cflags.m4 +++ b/make/autoconf/flags-cflags.m4 @@ -924,8 +924,9 @@ AC_DEFUN([FLAGS_SETUP_CFLAGS_CPU_DEP], # Check whether the compiler supports the Arm C Language Extensions (ACLE) # for SVE. Set SVE_CFLAGS to -march=armv8-a+sve if it does. # ACLE and this flag are required to build the aarch64 SVE related functions in - # libvectormath. - if test "x$OPENJDK_TARGET_CPU" = "xaarch64"; then + # libvectormath. Apple Silicon does not support SVE; use macOS as a proxy for + # that check. + if test "x$OPENJDK_TARGET_CPU" = "xaarch64" && test "x$OPENJDK_TARGET_OS" = "xlinux"; then if test "x$TOOLCHAIN_TYPE" = xgcc || test "x$TOOLCHAIN_TYPE" = xclang; then AC_LANG_PUSH(C) OLD_CFLAGS="$CFLAGS" ------------- PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2768656092 From fbredberg at openjdk.org Tue Apr 1 08:57:28 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 1 Apr 2025 08:57:28 GMT Subject: RFR: 8353117: Crash: assert(id >= ThreadIdentifier::initial() && id < ThreadIdentifier::current()) failed: must be reasonable) In-Reply-To: <6ludi2j0fKqL5_MirvViyefGITPvMKzAIX8EJIhfbFE=.3935e8bc-e9a3-4ed1-8da6-e943c97714b6@github.com> References: <6ludi2j0fKqL5_MirvViyefGITPvMKzAIX8EJIhfbFE=.3935e8bc-e9a3-4ed1-8da6-e943c97714b6@github.com> Message-ID: On Mon, 31 Mar 2025 18:15:39 GMT, Patricio Chilano Mateo wrote: > Please review the following fix. For the attaching thread case we are incorrectly setting the `_monitor_owner_id` after `Threads::add()` is called, i.e after the attaching thread becomes visible through a ThreadsListHandle. So if another thread calls `Threads::owning_thread_from_monitor()` in between these events and iterates through all JavaThreads looking for the owner of a given monitor, we might find this attaching thread still with a `_monitor_owner_id` of 0. > I corrected the ordering and improved verification checks. Tested in mach5 tiers1-5. > > Thanks, > Patricio It certainly looks like a change for the better. ------------- Marked as reviewed by fbredberg (Committer). PR Review: https://git.openjdk.org/jdk/pull/24336#pullrequestreview-2731978834 From kbarrett at openjdk.org Tue Apr 1 09:08:24 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 1 Apr 2025 09:08:24 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v6] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Mon, 31 Mar 2025 10:02:39 GMT, Magnus Ihse Bursie wrote: > I know the source code is bundled with the test image, but I'm not 100% sure if it just includes `src`, or if the entire top-level source is included. I'll need to check that, including what is the best way to get a proper reference to the top-level directory from a test. There was some discussion of this when recently adding the sources/TestNoNULL.java test. The code used here appears to similar in function (though different code) to the approach taken in that earlier test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24247#issuecomment-2768685198 From tschatzl at openjdk.org Tue Apr 1 09:24:12 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 1 Apr 2025 09:24:12 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v29] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq - * make young gen length revising independent of refinement thread * use a service task * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update - * fix IR code generation tests that change due to barrier cost changes - * factor out card table and refinement table merging into a single method - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 - * obsolete G1UpdateBufferSize G1UpdateBufferSize has previously been used to size the refinement buffers and impose a minimum limit on the number of cards per thread that need to be pending before refinement starts. The former function is now obsolete with the removal of the dirty card queues, the latter functionality has been taken over by the new diagnostic option `G1PerThreadPendingCardThreshold`. I prefer to make this a diagnostic option is better than a product option because it is something that is only necessary for some test cases to produce some otherwise unwanted behavior (continuous refinement). CSR is pending. - * more documentation on why we need to rendezvous the gc threads - Merge branch 'master' into 8342381-card-table-instead-of-dcq - ... and 27 more: https://git.openjdk.org/jdk/compare/aff5aa72...51fb6e63 ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=28 Stats: 7089 lines in 110 files changed: 2610 ins; 3555 del; 924 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From pminborg at openjdk.org Tue Apr 1 09:36:17 2025 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 1 Apr 2025 09:36:17 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v23] In-Reply-To: References: Message-ID: > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Add benchmarks and update copyright years ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/df4ef35c..f7f10fa1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=21-22 Stats: 28 lines in 7 files changed: 13 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From kbarrett at openjdk.org Tue Apr 1 09:43:28 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 1 Apr 2025 09:43:28 GMT Subject: RFR: 8352565: Add native method implementation of Reference.get() [v2] In-Reply-To: References: Message-ID: > Please review this change which adds a native method providing the > implementation of Reference::get. Referece::get is an intrinsic candidate, so > this native method implementation is only used when the intrinsic is not. > > Currently there is intrinsic support by the interpreter, C1, C2, and graal, > which are always used. With this change we can later remove all the > per-platform interpreter intrinsic implementations, and might also remove the > C1 intrinsic implementation. > > Testing: > (1) mach5 tier1-6 normal (so using all the existing intrinsics). > (2) mach5 tier1-6 with interpreter and C1 Reference::get intrinsics disabled. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: parameterized return type of native get0 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24315/files - new: https://git.openjdk.org/jdk/pull/24315/files/f1734062..37dc9b74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24315&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24315&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24315/head:pull/24315 PR: https://git.openjdk.org/jdk/pull/24315 From kbarrett at openjdk.org Tue Apr 1 09:58:15 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 1 Apr 2025 09:58:15 GMT Subject: RFR: 8352565: Add native method implementation of Reference.get() [v2] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 01:54:13 GMT, Chen Liang wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> parameterized return type of native get0 > > src/java.base/share/classes/java/lang/ref/Reference.java line 365: > >> 363: * C2 to sometimes prefer the native implementation over the intrinsic. >> 364: */ >> 365: private native Object get0(); > > I think you can declare this as `private native T get0();` without changes to native method signatures, so you can avoid the unchecked cast above. (See Class::getPrimitiveClass declaration) > > Also, can C2 choose to use native over intrinsic? That is concerning from a performance POV, as I think there are a few such performance sensitive methods in core libraries. It hadn't occurred to me that would work, but indeed it does. Thanks for the suggestion. I've used tools like `-XX:+PrintInlining` when running a test that I found to be performance sensitive to the use of the intrinsic, and found that it was used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24315#discussion_r2022537959 From pminborg at openjdk.org Tue Apr 1 10:02:50 2025 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 1 Apr 2025 10:02:50 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v24] In-Reply-To: References: Message-ID: <5OJvCICcMF02tdxdwIh2WLgMqZl5IHgwpYyX47ZAV70=.1a7b42c5-fadd-4fbd-a2e6-c72a2a850fd3@github.com> > Implement JEP 502. > > The PR passes tier1-tier3 tests. Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Add additional benchmarks with maps holding method handles ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23972/files - new: https://git.openjdk.org/jdk/pull/23972/files/f7f10fa1..dfb940be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=22-23 Stats: 16 lines in 1 file changed: 16 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 PR: https://git.openjdk.org/jdk/pull/23972 From pminborg at openjdk.org Tue Apr 1 10:06:43 2025 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 1 Apr 2025 10:06:43 GMT Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) [v6] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 02:35:01 GMT, Chen Liang wrote: >> src/hotspot/share/ci/ciField.cpp line 254: >> >>> 252: >>> 253: static bool trust_final_non_static_fields_of_type(Symbol* signature) { >>> 254: return signature == vmSymbols::java_lang_StableValue_signature(); >> >> Just a note that we will need to decide whether to keep this or not... > > We might change this to require stable values to be strict final instead if strict final is previewed at the same time as stable values - https://openjdk.org/jeps/8350458 This has now been removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23972#discussion_r2022552862 From stefank at openjdk.org Tue Apr 1 11:12:37 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 1 Apr 2025 11:12:37 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v6] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: <4RCTjaaCqzo0ZjzZIIlEmWVMqQU90-j-HeuGvZAVV7M=.360d98b4-3aa7-46bc-a3cb-efdaaf12db0d@github.com> On Fri, 28 Mar 2025 22:24:40 GMT, Doug Simon wrote: >> This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). >> >> By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. >> >> The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. >> >> I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. >> >> When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: >> >> java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: >> >> java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci >> >> at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) >> at java.base/java.lang.Thread.run(Thread.java:1447) >> Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: >> >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp >> /Users/dnsimo... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > convert Windows path to Unix path This looks good to me. I personally would have preferred to have the tool somewhere other than in the test directory, but I've gotten feedback from other HotSpot devs that they think its better to have the tool there. I leave the review of TEST.group to someone else. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24247#pullrequestreview-2732333629 From duke at openjdk.org Tue Apr 1 11:38:19 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Tue, 1 Apr 2025 11:38:19 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 08:41:53 GMT, Severin Gehwolf wrote: >> One option is to pass an argument to `determine_type` to indicate it is being called from the test suite and skip the call to `statfs` in such case. > > If we really must, I'd rather have a function pointer for the statfs call which we can replace in test code. It doesn't seem worth the extra complexity in my opinion though. This is what I had so far (not yet fully tested), but it adds a null check to the non-testing code path...; I can try an approach with a function pointer too if necessary; let me know how to proceed: diff --git a/src/hotspot/os/linux/cgroupSubsystem_linux.cpp b/src/hotspot/os/linux/cgroupSubsystem_linux.cpp index 612cb9a9302..6479853ac04 100644 --- a/src/hotspot/os/linux/cgroupSubsystem_linux.cpp +++ b/src/hotspot/os/linux/cgroupSubsystem_linux.cpp @@ -66,26 +66,10 @@ CgroupSubsystem* CgroupSubsystemFactory::create() { CgroupV1Controller* pids = nullptr; CgroupInfo cg_infos[CG_INFO_LENGTH]; u1 cg_type_flags = INVALID_CGROUPS_GENERIC; - const char* proc_cgroups = "/proc/cgroups"; - const char* sys_fs_cgroup_cgroup_controllers = "/sys/fs/cgroup/cgroup.controllers"; - const char* controllers_file = proc_cgroups; const char* proc_self_cgroup = "/proc/self/cgroup"; const char* proc_self_mountinfo = "/proc/self/mountinfo"; - const char* sys_fs_cgroup = "/sys/fs/cgroup"; - struct statfs fsstat = {}; - bool cgroups_v2_enabled = false; - // Assume cgroups v2 is usable by the JDK iff /sys/fs/cgroup has the cgroup v2 - // file system magic. If it does not then heuristics are required to determine - // if cgroups v1 is usable or not. - if (statfs(sys_fs_cgroup, &fsstat) != -1) { - cgroups_v2_enabled = (fsstat.f_type == CGROUP2_SUPER_MAGIC); - if (cgroups_v2_enabled) { - controllers_file = sys_fs_cgroup_cgroup_controllers; - } - } - - bool valid_cgroup = determine_type(cg_infos, cgroups_v2_enabled, controllers_file, proc_self_cgroup, proc_self_mountinfo, &cg_type_flags); + bool valid_cgroup = determine_type(cg_infos, true, NULL, proc_self_cgroup, proc_self_mountinfo, &cg_type_flags); if (!valid_cgroup) { // Could not detect cgroup type @@ -249,9 +233,16 @@ static inline bool match_mount_info_line(char* line, tmpcgroups) == 5; } +/* + * If controllers_file_mock is non-NULL use it as the controllers file + * and respect cgroups_v2_enabled_mock. This is used by WhiteBox to + * mock the statfs call. If controllers_file_mock is NULL, ignore + * cgroups_v2_enabled_mock and determine using statfs what to use as + * the controllers file. + */ bool CgroupSubsystemFactory::determine_type(CgroupInfo* cg_infos, - bool cgroups_v2_enabled, - const char* controllers_file, + bool cgroups_v2_enabled_mock, + const char* controllers_file_mock, const char* proc_self_cgroup, const char* proc_self_mountinfo, u1* flags) { @@ -265,6 +256,28 @@ bool CgroupSubsystemFactory::determine_type(CgroupInfo* cg_infos, // pids might not be enabled on older Linux distros (SLES 12.1, RHEL 7.1) // cpuset might not be enabled on newer Linux distros (Fedora 41) bool all_required_controllers_enabled = true; + bool cgroups_v2_enabled = false; + const char* controllers_file = controllers_file_mock; + + if (controllers_file) { + cgroups_v2_enabled = cgroups_v2_enabled_mock; + } else { + const char* proc_cgroups = "/proc/cgroups"; + const char* sys_fs_cgroup_cgroup_controllers = "/sys/fs/cgroup/cgroup.controllers"; + const char* sys_fs_cgroup = "/sys/fs/cgroup"; + struct statfs fsstat = {}; + + controllers_file = proc_cgroups; + // Assume cgroups v2 is usable by the JDK iff /sys/fs/cgroup has the cgroup v2 + // file system magic. If it does not then heuristics are required to determine + // if cgroups v1 is usable or not. + if (statfs(sys_fs_cgroup, &fsstat) != -1) { + cgroups_v2_enabled = (fsstat.f_type == CGROUP2_SUPER_MAGIC); + if (cgroups_v2_enabled) { + controllers_file = sys_fs_cgroup_cgroup_controllers; + } + } + } // If cgroups v2 is enabled, open /sys/fs/cgroup/cgroup.controllers. If not, open /proc/cgroups. controllers = os::fopen(controllers_file, "r"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r2022679495 From ihse at openjdk.org Tue Apr 1 11:54:17 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 1 Apr 2025 11:54:17 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: <3t_wc5_YstX1OvL-QywjwWxard8Da4qisWUEh4qAJ4M=.b6a72c38-b3b3-4f64-a5d5-a20363153b91@github.com> On Sat, 29 Mar 2025 17:46:43 GMT, Julian Waters wrote: > That would implicitly mean to any developers that it's shared code for all currently supported operating systems: Windows, macOS, Linux and AIX, which may be rather confusing if it's only meant to be used on specific platforms. @TheShermanTanker To clarify, the `share` directory does not imply that the code within is used on *all* platform, only that it is shared on *some* platform, and no more specific home was possible. There are a lot of code all over the place in `share` that are excluded from one or another platform. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2769101099 From asmehra at openjdk.org Tue Apr 1 12:22:25 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 1 Apr 2025 12:22:25 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 10:42:39 GMT, Severin Gehwolf wrote: >> Thomas Fitzsimmons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: >> >> - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 >> - Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent >> - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing >> >> Remove from cgroups v1 branch incorrect log messages about cpuset >> controller being optional. Add test case for cgroups v1, cpuset >> disabled. >> - Improve !cgroups_v2_enabled branch comment >> - Debug-log optional and disabled cgroups v2 controllers >> >> Do not log enabled controllers that are not relevant to the JDK. >> - Move index declaration to scope in which it is used >> - Remove empty string check during cgroup.controllers parsing >> - Define ISSPACE_CHARS macro, use it in strsep call >> - Pass fgets result to strsep >> - Replace is_cgroupsV2 with cgroups_v2_enabled >> >> Also fix the testCgroupv1SystemdOnly and testCgroupv1NoMounts test >> cases such that their /proc/cgroups and /proc/self/cgroup contents >> correspond. This prevents assertion failures these tests were >> producing when is_cgroupsV2 was replaced with cgroups_v2_enabled. >> - ... and 3 more: https://git.openjdk.org/jdk/compare/fa4cded7...b6926e15 > > @tstuefe @ashu-mehra Could you please help with a second review? @jerboaa @fitzsim Does the current mainline code handles mixed configuration where in some controllers are v1 and others v2? For example cpu controller is mounted as v1 while memory controller as v2. If yes, does this patch continue to support such configuration? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23811#issuecomment-2769176066 From stuefe at openjdk.org Tue Apr 1 13:26:21 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 1 Apr 2025 13:26:21 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 14:02:03 GMT, Thomas Stuefe wrote: > In preparation for planned GC performance improvements (KLUT), I would like to reduce the average number of oop map entries. > > For details, please see JBS issue text. > > ----------------------- > > Patch results: > > The patch brings a positive change of oop map size, reducing the likelihood of lengthy oop maps. Here the oop map size distribution over all JDK classes in the JDK image: > > Before: > > 5395 - non-static oop maps (0 entries) > 9330 - non-static oop maps (1 entries) > 1449 - non-static oop maps (2 entries) > 274 - non-static oop maps (3 entries) > 218 - non-static oop maps (4 entries) > 75 - non-static oop maps (5 entries) > 7 - non-static oop maps (6 entries) > 4 - non-static oop maps (7 entries) > > > Now: > > 5395 - non-static oop maps (0 entries) > 10178 - non-static oop maps (1 entries) > 933 - non-static oop maps (2 entries) > 229 - non-static oop maps (3 entries) > 16 - non-static oop maps (4 entries) > 1 - non-static oop maps (5 entries) > > > For example, `java.util.concurrent.ConcurrentHashMap$TreeNode` is changed from having 2 entries to having just one entry, which is nice for a class that may be instantiated a lot: > > Before: > > java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000000d1dddc0} > - ---- non-static fields (9 words): > - final 'hash' 'I' @12 > - final 'key' 'Ljava/lang/Object;' @16 > - volatile 'val' 'Ljava/lang/Object;' @20 > - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class > - 'red' 'Z' @28 << derived class starts here, non-oops lead > - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 > - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 > - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 > - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @44 > - non-static oop maps (2 entries): 16-24 32-44 > > Now: > > java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000007e1de450} > - ---- non-static fields (9 words): > - final 'hash' 'I' @12 > - final 'key' 'Ljava/lang/Object;' @16 > - volatile 'val' 'Ljava/lang/Object;' @20 > - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class > - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @28 << class starts here, oops lead > - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 > - 'right' 'Ljava/util/concurrent/Concurre... Ping @fparain ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24330#issuecomment-2769350152 From sgehwolf at openjdk.org Tue Apr 1 13:42:46 2025 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 1 Apr 2025 13:42:46 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 10:42:39 GMT, Severin Gehwolf wrote: >> Thomas Fitzsimmons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: >> >> - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 >> - Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent >> - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing >> >> Remove from cgroups v1 branch incorrect log messages about cpuset >> controller being optional. Add test case for cgroups v1, cpuset >> disabled. >> - Improve !cgroups_v2_enabled branch comment >> - Debug-log optional and disabled cgroups v2 controllers >> >> Do not log enabled controllers that are not relevant to the JDK. >> - Move index declaration to scope in which it is used >> - Remove empty string check during cgroup.controllers parsing >> - Define ISSPACE_CHARS macro, use it in strsep call >> - Pass fgets result to strsep >> - Replace is_cgroupsV2 with cgroups_v2_enabled >> >> Also fix the testCgroupv1SystemdOnly and testCgroupv1NoMounts test >> cases such that their /proc/cgroups and /proc/self/cgroup contents >> correspond. This prevents assertion failures these tests were >> producing when is_cgroupsV2 was replaced with cgroups_v2_enabled. >> - ... and 3 more: https://git.openjdk.org/jdk/compare/85d60127...b6926e15 > > @tstuefe @ashu-mehra Could you please help with a second review? > @jerboaa @fitzsim Does the current mainline code handles mixed configuration where in some controllers are v1 and others v2? For example cpu controller is mounted as v1 while memory controller as v2. If yes, does this patch continue to support such configuration? The current code does not allow mixed configuration for "relevant" controllers (essentially cpu and memory). That is, they ought to be v1 or v2. In the hybrid case (systemd running on unified), it's considered v1. I don't think this patch changes any of it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23811#issuecomment-2769400331 From kevinw at openjdk.org Tue Apr 1 13:54:28 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Tue, 1 Apr 2025 13:54:28 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising Message-ID: We should be consistent, and run all OnError items in a new shell. Currently the ; separator causes a new shell, but multiple -XX:OnError= options are grouped into the same shell. next_OnError_command() decides on where a new command starts. It should recognise newlines, and all commands will get their own shell. ------------- Commit messages: - Recognise newlines in next_OnError_command Changes: https://git.openjdk.org/jdk/pull/24354/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24354&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353439 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24354/head:pull/24354 PR: https://git.openjdk.org/jdk/pull/24354 From kevinw at openjdk.org Tue Apr 1 13:54:28 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Tue, 1 Apr 2025 13:54:28 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 10:59:16 GMT, Kevin Walls wrote: > We should be consistent, and run all OnError items in a new shell. Currently the ; separator causes a new shell, but multiple -XX:OnError= options are grouped into the same shell. > > next_OnError_command() decides on where a new command starts. It should recognise newlines, and all commands will get their own shell. These tests reference -XX:OnError and all still pass: test/hotspot/jtreg/gc/epsilon/TestDieWithOnError.java test/hotspot/jtreg/runtime/ErrorHandling/TestOnError.java test/hotspot/jtreg/runtime/ErrorHandling/TimeoutInErrorHandlingTest.java test/hotspot/jtreg/serviceability/sa/ClhsdbFlags.java All of tiers 1,2,3 in CI also. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24354#issuecomment-2769429231 From duke at openjdk.org Tue Apr 1 13:55:27 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Tue, 1 Apr 2025 13:55:27 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: <32F4Zti2B7PZSJ9rAw22xBn8pWzO4x6lSTX_qPvN9tk=.4df4f31d-9312-460a-9adf-82f199456db9@github.com> On Tue, 1 Apr 2025 13:39:53 GMT, Severin Gehwolf wrote: > > @jerboaa @fitzsim Does the current mainline code handles mixed configuration where in some controllers are v1 and others v2? For example cpu controller is mounted as v1 while memory controller as v2. If yes, does this patch continue to support such configuration? > > The current code does not allow mixed configuration for "relevant" controllers (essentially cpu and memory). That is, they ought to be v1 or v2. In the hybrid case (systemd running on unified), it's considered v1. I don't think this patch changes any of it. Yes, I tried to keep the logic the same. There are some hybrid test cases (though none testing exactly the setup you described), and they continue to pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23811#issuecomment-2769437121 From mbaesken at openjdk.org Tue Apr 1 14:03:36 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 1 Apr 2025 14:03:36 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 13:49:46 GMT, Magnus Ihse Bursie wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> address Windows issues > > How problematic would it be to read it on demand? Is it just that there is a risk that it won't work, or could it cause the crash dumping process to fail completely? > @magicus like so many things in the crash reporting process, when in a signal handling context, it could lead to a secondary fault, or it could deadlock, or it might work. I thought we read already some stuff from e.g. /proc while printing the hserr file, but you are right it is better to avoid reading files while doing the hserr reporting. Could we maybe load the release file with a bit of delay to avoid even a small hit on startup performance? is there already some task/thread doing such delayed operations ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2769463699 From sgehwolf at openjdk.org Tue Apr 1 14:14:25 2025 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 1 Apr 2025 14:14:25 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 11:34:50 GMT, Thomas Fitzsimmons wrote: >> If we really must, I'd rather have a function pointer for the statfs call which we can replace in test code. It doesn't seem worth the extra complexity in my opinion though. > > This is what I had so far (not yet fully tested), but it adds a null check to the non-testing code path...; I can try an approach with a function pointer too if necessary; let me know how to proceed: > > > diff --git a/src/hotspot/os/linux/cgroupSubsystem_linux.cpp b/src/hotspot/os/linux/cgroupSubsystem_linux.cpp > index 612cb9a9302..6479853ac04 100644 > --- a/src/hotspot/os/linux/cgroupSubsystem_linux.cpp > +++ b/src/hotspot/os/linux/cgroupSubsystem_linux.cpp > @@ -66,26 +66,10 @@ CgroupSubsystem* CgroupSubsystemFactory::create() { > CgroupV1Controller* pids = nullptr; > CgroupInfo cg_infos[CG_INFO_LENGTH]; > u1 cg_type_flags = INVALID_CGROUPS_GENERIC; > - const char* proc_cgroups = "/proc/cgroups"; > - const char* sys_fs_cgroup_cgroup_controllers = "/sys/fs/cgroup/cgroup.controllers"; > - const char* controllers_file = proc_cgroups; > const char* proc_self_cgroup = "/proc/self/cgroup"; > const char* proc_self_mountinfo = "/proc/self/mountinfo"; > - const char* sys_fs_cgroup = "/sys/fs/cgroup"; > - struct statfs fsstat = {}; > - bool cgroups_v2_enabled = false; > > - // Assume cgroups v2 is usable by the JDK iff /sys/fs/cgroup has the cgroup v2 > - // file system magic. If it does not then heuristics are required to determine > - // if cgroups v1 is usable or not. > - if (statfs(sys_fs_cgroup, &fsstat) != -1) { > - cgroups_v2_enabled = (fsstat.f_type == CGROUP2_SUPER_MAGIC); > - if (cgroups_v2_enabled) { > - controllers_file = sys_fs_cgroup_cgroup_controllers; > - } > - } > - > - bool valid_cgroup = determine_type(cg_infos, cgroups_v2_enabled, controllers_file, proc_self_cgroup, proc_self_mountinfo, &cg_type_flags); > + bool valid_cgroup = determine_type(cg_infos, true, NULL, proc_self_cgroup, proc_self_mountinfo, &cg_type_flags); > > if (!valid_cgroup) { > // Could not detect cgroup type > @@ -249,9 +233,16 @@ static inline bool match_mount_info_line(char* line, > tmpcgroups) == 5; > } > > +/* > + * If controllers_file_mock is non-NULL use it as the controllers file > + * and respect cgroups_v2_enabled_mock. This is used by WhiteBox to > + * mock the statfs call. If controllers_file_mock is NULL, ignore > + * cgroups_v2_enabled_mock and determine using statfs what to use as > + * the controllers file. > + */ > bool CgroupSubsystemFactory::determine_type(CgroupInfo* cg_infos, > - bool cgroups_v2_enabled, > - const char* controllers_file, > + bool ... This doesn't make the code any better to read, IMO. The previous version seems better considering the non-mock code path isn't tested any more/less. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r2022948807 From kbarrett at openjdk.org Tue Apr 1 14:34:16 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 1 Apr 2025 14:34:16 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v6] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Fri, 28 Mar 2025 22:24:40 GMT, Doug Simon wrote: >> This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). >> >> By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. >> >> The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. >> >> I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. >> >> When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: >> >> java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: >> >> java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci >> >> at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) >> at java.base/java.lang.Thread.run(Thread.java:1447) >> Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: >> >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp >> /Users/dnsimo... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > convert Windows path to Unix path test/hotspot/jtreg/TEST.groups line 142: > 140: > 141: tier1_common = \ > 142: sources \ I don't understand this change. How does this end up doing anything different than before? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2022491702 From fparain at openjdk.org Tue Apr 1 14:50:08 2025 From: fparain at openjdk.org (Frederic Parain) Date: Tue, 1 Apr 2025 14:50:08 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 14:02:03 GMT, Thomas Stuefe wrote: > In preparation for planned GC performance improvements (KLUT), I would like to reduce the average number of oop map entries. > > For details, please see JBS issue text. > > ----------------------- > > Patch results: > > The patch brings a positive change of oop map size, reducing the likelihood of lengthy oop maps. Here the oop map size distribution over all JDK classes in the JDK image: > > Before: > > 5395 - non-static oop maps (0 entries) > 9330 - non-static oop maps (1 entries) > 1449 - non-static oop maps (2 entries) > 274 - non-static oop maps (3 entries) > 218 - non-static oop maps (4 entries) > 75 - non-static oop maps (5 entries) > 7 - non-static oop maps (6 entries) > 4 - non-static oop maps (7 entries) > > > Now: > > 5395 - non-static oop maps (0 entries) > 10178 - non-static oop maps (1 entries) > 933 - non-static oop maps (2 entries) > 229 - non-static oop maps (3 entries) > 16 - non-static oop maps (4 entries) > 1 - non-static oop maps (5 entries) > > > For example, `java.util.concurrent.ConcurrentHashMap$TreeNode` is changed from having 2 entries to having just one entry, which is nice for a class that may be instantiated a lot: > > Before: > > java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000000d1dddc0} > - ---- non-static fields (9 words): > - final 'hash' 'I' @12 > - final 'key' 'Ljava/lang/Object;' @16 > - volatile 'val' 'Ljava/lang/Object;' @20 > - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class > - 'red' 'Z' @28 << derived class starts here, non-oops lead > - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 > - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 > - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 > - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @44 > - non-static oop maps (2 entries): 16-24 32-44 > > Now: > > java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000007e1de450} > - ---- non-static fields (9 words): > - final 'hash' 'I' @12 > - final 'key' 'Ljava/lang/Object;' @16 > - volatile 'val' 'Ljava/lang/Object;' @20 > - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class > - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @28 << class starts here, oops lead > - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 > - 'right' 'Ljava/util/concurrent/Concurre... Looks good to me. Thank you for improving instances' layouts. Fred ------------- Marked as reviewed by fparain (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24330#pullrequestreview-2732971561 From mbaesken at openjdk.org Tue Apr 1 14:51:47 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 1 Apr 2025 14:51:47 GMT Subject: RFR: 8345265: Minor improvements for LTO across all compilers [v2] In-Reply-To: References: Message-ID: <1JwtxX-_wpEi7pu8DRZfclDGx9DjR4lO3ySAt9JsYoQ=.41a275a5-0e91-4118-9bc3-e278eacf3aca@github.com> On Mon, 31 Mar 2025 13:09:51 GMT, Julian Waters wrote: > but what happens if you replace both instances of -fuse-linker-plugin with -fno-use-linker-plugin on Linux in JvmFeatures.gmk Then the compilation fails when lto is configured with this message cc1plus: error: '-fno-fat-lto-objects' are supported only with linker plugin ------------- PR Comment: https://git.openjdk.org/jdk/pull/22464#issuecomment-2769633581 From stefank at openjdk.org Tue Apr 1 15:02:15 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 1 Apr 2025 15:02:15 GMT Subject: RFR: 8353329: Small memory leak when create GrowableArray with initial size 0 In-Reply-To: <8AYi7nNaVzabbCR4r8fRTK7_rPXo4JgVUI6dKOXaxiQ=.289463ca-a4c5-4966-b14b-1039ad51cc32@github.com> References: <8AYi7nNaVzabbCR4r8fRTK7_rPXo4JgVUI6dKOXaxiQ=.289463ca-a4c5-4966-b14b-1039ad51cc32@github.com> Message-ID: On Tue, 1 Apr 2025 00:18:07 GMT, Zhengyu Gu wrote: > Please review this small fix to avoid 1 byte leak when create a GrowableArray with initial size = 0. > > GrowableArray's c-heap allocator does not check size = 0 and os:malloc(0) will malloc at least 1 byte. Note that GrowableArrayCHeap already has this code: static E* allocate(int max, MemTag mem_tag) { if (max == 0) { return nullptr; } return (E*)GrowableArrayCHeapAllocator::allocate(max, sizeof(E), mem_tag); } So, maybe we should just add the same check to GrowableArray? Or, alternatively, remove the check above? I added a style comment bellow: src/hotspot/share/utilities/growableArray.cpp line 46: > 44: void* GrowableArrayCHeapAllocator::allocate(int max, int element_size, MemTag mem_tag) { > 45: assert(max >= 0, "integer overflow"); > 46: if (max == 0) return nullptr; I would prefer if we don't introduce more code that uses this if/return one-liner into this code. I understand that some parts of our code base uses it, but in code that I maintain / review I try to prevent more introduction of this. I would like to suggest the following, which adds blank lines to clearly separate that this part of the function is separate from the surrounding code. I want it to be prominent that we have a special case here. Suggestion: if (max == 0) { return nullptr; } ------------- Changes requested by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24341#pullrequestreview-2732968879 PR Review Comment: https://git.openjdk.org/jdk/pull/24341#discussion_r2023014005 From ccheung at openjdk.org Tue Apr 1 15:03:11 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 1 Apr 2025 15:03:11 GMT Subject: RFR: 8353129: CDS ArchiveRelocation tests fail after JDK-8325132 [v2] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 05:20:45 GMT, David Holmes wrote: > LGTM2. I hope it now passes tier4. Yes, I re-ran tiers 1, 3, and 4 testing with the latest change. > src/hotspot/share/prims/whitebox.cpp line 2136: > >> 2134: WB_ENTRY(jint, WB_GetArchiveRelocationMode(JNIEnv* env, jobject wb)) >> 2135: #if INCLUDE_CDS >> 2136: return (jint)ArchiveRelocationMode; > > Nit: do we need casts between int and jint ?? The latest change doesn't involve whitebox.cpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24308#issuecomment-2769663183 PR Review Comment: https://git.openjdk.org/jdk/pull/24308#discussion_r2023041268 From stuefe at openjdk.org Tue Apr 1 15:09:32 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 1 Apr 2025 15:09:32 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 14:46:17 GMT, Frederic Parain wrote: > Looks good to me. Thank you for improving instances' layouts. > > Fred Thanks Frederic :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24330#issuecomment-2769688396 From ihse at openjdk.org Tue Apr 1 15:12:32 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 1 Apr 2025 15:12:32 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 15:11:06 GMT, Matthias Baesken wrote: >> The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. >> SOURCE=".:git:21af8c7e7405" >> Also the MODULES list is probably useful to have. >> Add this info (or the complete content of the release file) to the hs_err files. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > address Windows issues I guess if we already read files, then we're either already sh*t out of luck if I/O is broken, or we have proven that it works. If that is indeed the case, my recommendation would be to read the release file at crash dump time as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2769699929 From mbaesken at openjdk.org Tue Apr 1 15:17:52 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 1 Apr 2025 15:17:52 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 15:09:57 GMT, Magnus Ihse Bursie wrote: > I guess if we already read files, then we're either already sh*t out of luck if I/O is broken, or we have proven that it works Not sure if reading from /proc is the same. 'Proven that it works' - yeah it works most of the time, I would say that. But there is still a little added risk in the hserr generation process. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2769713950 From zgu at openjdk.org Tue Apr 1 15:19:28 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Tue, 1 Apr 2025 15:19:28 GMT Subject: RFR: 8353329: Small memory leak when create GrowableArray with initial size 0 [v2] In-Reply-To: <8AYi7nNaVzabbCR4r8fRTK7_rPXo4JgVUI6dKOXaxiQ=.289463ca-a4c5-4966-b14b-1039ad51cc32@github.com> References: <8AYi7nNaVzabbCR4r8fRTK7_rPXo4JgVUI6dKOXaxiQ=.289463ca-a4c5-4966-b14b-1039ad51cc32@github.com> Message-ID: > Please review this small fix to avoid 1 byte leak when create a GrowableArray with initial size = 0. > > GrowableArray's c-heap allocator does not check size = 0 and os:malloc(0) will malloc at least 1 byte. Zhengyu Gu has updated the pull request incrementally with two additional commits since the last revision: - Copyright year - stefank's comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24341/files - new: https://git.openjdk.org/jdk/pull/24341/files/126f4e28..ba9df7d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24341&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24341&range=00-01 Stats: 8 lines in 2 files changed: 2 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24341.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24341/head:pull/24341 PR: https://git.openjdk.org/jdk/pull/24341 From mbaesken at openjdk.org Tue Apr 1 15:21:52 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 1 Apr 2025 15:21:52 GMT Subject: RFR: 8345265: Minor improvements for LTO across all compilers [v2] In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 14:54:03 GMT, Julian Waters wrote: >> This is a general cleanup and improvement of LTO, as well as a quick fix to remove a workaround in the Makefiles that disabled LTO for g1ParScanThreadState.cpp due to the old poisoning mechanism causing trouble. The -Wno-attribute-warning change here can be removed once Kim's new poisoning solution is integrated. >> >> - -fno-omit-frame-pointer is added to gcc to stop the linker from emitting code without the frame pointer >> - -flto is set to $(JOBS) instead of auto to better match what the user requested >> - -Gy is passed to the Microsoft compiler. This does not fully fix LTO under Microsoft, but prevents warnings about -LTCG:INCREMENTAL at least > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge branch 'openjdk:master' into patch-16 > - -fno-omit-frame-pointer in JvmFeatures.gmk > - Revert compilerWarnings_gcc.hpp > - General LTO fixes JvmFeatures.gmk > - Revert DISABLE_POISONING_STOPGAP compilerWarnings_gcc.hpp > - Merge branch 'openjdk:master' into patch-16 > - Revert os.cpp > - Fix memory leak in jvmciEnv.cpp > - Stopgap fix in os.cpp > - Declaration fix in compilerWarnings_gcc.hpp > - ... and 2 more: https://git.openjdk.org/jdk/compare/2c3c6c41...9d05cb8e When setting -fno-use-linker-plugin and removing also -fno-fat-lto-objects , the error above goes away. but we get a ton of other errors src/hotspot/share/gc/g1/g1ParScanThreadState.cpp:663: error: undefined reference to 'G1NUMA::is_enabled() const' src/hotspot/share/gc/g1/g1ParScanThreadState.cpp:667: error: undefined reference to 'G1NUMA::num_active_nodes() const' src/hotspot/share/gc/g1/g1ParScanThreadState.cpp:669: error: undefined reference to 'AllocateHeap(unsigned long, MemTag, AllocFailStrategy::AllocFailEnum)' src/hotspot/share/gc/g1/g1ParScanThreadState.cpp:116: error: undefined reference to 'G1RedirtyCardsLocalQueueSet::flush()' src/hotspot/share/gc/g1/g1ParScanThreadState.cpp:677: error: undefined reference to 'G1NUMA::index_of_current_thread() const' src/hotspot/share/gc/g1/g1ParScanThreadState.cpp:678: error: undefined reference to 'G1NUMA::copy_statistics(G1NUMAStats::NodeDataItems, unsigned int, unsigned long*)' src/hotspot/share/gc/g1/g1ParScanThreadState.cpp:119: error: undefined reference to 'G1PLABAllocator::flush_and_retire_stats(unsigned int)' src/hotspot/share/gc/g1/g1Policy.hpp:427: error: undefined reference to 'AgeTable::merge(AgeTable const*)' src/hotspot/share/gc/g1/g1ParScanThreadState.cpp:123: error: undefined reference to 'G1NewTracer::report_evacuation_failed(EvacuationFailedInfo&)' src/hotspot/share/gc/g1/g1Allocator.inline.hpp:116: error: undefined reference to 'G1PLABAllocator::allocate_direct_or_new_plab(G1HeapRegionAttr, unsigned long, bool*, unsigned int)' src/hotspot/share/gc/g1/g1ParScanThreadState.cpp:384: error: undefined reference to 'G1PLABAllocator::allocate_direct_or_new_plab(G1HeapRegionAttr, unsigned long, bool*, unsigned int)' src/hotspot/share/gc/g1/g1ParScanThreadState.cpp:397: error: undefined reference to 'YoungGCTracer::should_report_promotion_events() const' src/hotspot/share/gc/g1/g1Allocator.inline.hpp:116: error: undefined reference to 'G1PLABAllocator::allocate_direct_or_new_plab(G1HeapRegionAttr, unsigned long, bool*, unsigned int)' src/hotspot/share/gc/shared/cardTable.hpp:184: error: undefined reference to 'CardTable::_card_size' src/hotspot/share/gc/g1/g1CollectedHeap.inline.hpp:126: error: undefined reference to 'G1HeapRegion::LogOfHRGrainBytes' src/hotspot/share/gc/g1/g1CollectedHeap.inline.hpp:79: error: undefined reference to 'G1Policy::phase_times() const' src/hotspot/share/gc/g1/g1ParScanThreadState.cpp:143: error: undefined reference to 'G1PLABAllocator::waste() const' src/hotspot/share/gc/g1/g1ParScanThreadState.cpp:147: error: undefined reference to 'G1PLABAllocator::undo_waste() const' and src/hotspot/os/linux/mallocInfoDcmd.cpp: In member function 'execute': src/hotspot/os/linux/mallocInfoDcmd.cpp:57:3: warning: call to 'free' declared with attribute warning: use os::free [-Wattribute-warning] In member function '__dt_base ', inlined from 'c2v_getLocalVariableTableLength' at src/hotspot/share/jvmci/jvmciCompilerToVM.cpp:1348:1: src/hotspot/share/jvmci/jvmciEnv.cpp:615:5: warning: call to 'free' declared with attribute warning: use os::free [-Wattribute-warning] In member function '__dt_base ', inlined from 'c2v_getCountersSize' at src/hotspot/share/jvmci/jvmciCompilerToVM.cpp:1403:1: src/hotspot/share/jvmci/jvmciEnv.cpp:615:5: warning: call to 'free' declared with attribute warning: use os::free [-Wattribute-warning] In member function '__dt_base ', inlined from 'c2v_setCountersSize' at src/hotspot/share/jvmci/jvmciCompilerToVM.cpp:1407:1: src/hotspot/share/jvmci/jvmciEnv.cpp:615:5: warning: call to 'free' declared with attribute warning: use os::free [-Wattribute-warning] In member function '__dt_base ', inlined from 'c2v_isMature' at src/hotspot/share/jvmci/jvmciCompilerToVM.cpp:1425:1: src/hotspot/share/jvmci/jvmciEnv.cpp:615:5: warning: call to 'free' declared with attribute warning: use os::free [-Wattribute-warning] In member function '__dt_base ', ... (rest of output omitted) (some examples) ; so this does not work (at least with gcc-14) . ------------- PR Comment: https://git.openjdk.org/jdk/pull/22464#issuecomment-2769734227 From stefank at openjdk.org Tue Apr 1 15:27:20 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 1 Apr 2025 15:27:20 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 21:10:01 GMT, Gerard Ziemski wrote: >> src/hotspot/share/runtime/safepointMechanism.cpp line 60: >> >>> 58: const size_t page_size = os::vm_page_size(); >>> 59: const size_t allocation_size = 2 * page_size; >>> 60: char* polling_page = os::reserve_memory(allocation_size, mtSafepoint, !ExecMem); >> >> Suggestion: >> >> char* polling_page = os::reserve_memory(allocation_size, mtSafepoint); > > I think here we need to keep `!ExecMem` since it is a parameter. I don't understand what you mean with that. `ExecMem` is a constant and os::reserve_memory has an optional parameter `executable`. >> src/hotspot/share/utilities/debug.cpp line 715: >> >>> 713: #ifdef CAN_SHOW_REGISTERS_ON_ASSERT >>> 714: void initialize_assert_poison() { >>> 715: char* page = os::reserve_memory(os::vm_page_size(), mtInternal, !ExecMem); >> >> Suggestion: >> >> char* page = os::reserve_memory(os::vm_page_size(), mtInternal); > > Again, `ExecMem` is a parameter. Same comment as above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2023089682 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2023090093 From stefank at openjdk.org Tue Apr 1 15:32:17 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 1 Apr 2025 15:32:17 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 14:09:22 GMT, Robert Toyonaga wrote: > OK should I update this PR to do the following things: > > * Add comments explaining the asymmetrical locking and warning against patterns that lead to races Sounds like a good idea. > > * swapping the order of `NmtVirtualMemoryLocker` and release/uncommit I wonder if this should be done as new RFE after the change below. It might need a bit of investigation to make sure that the reasoning around this is correct. > > * Fail fatally if release/uncommit does not complete. I think this would be a good, separate RFE to be done before we try to swap the order. > > > Or does it make more sense to do that in a different issue/PR? > > Also, do we want to keep the new tests and the refactorings (see below)? > > ``` > if (MemTracker::enabled()) { > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_some_operation(addr, bytes); > if (result != nullptr) { > MemTracker::record_some_operation(addr, bytes); > } > } else { > result = pd_unmap_memory(addr, bytes); > } > ``` > > To: > > ``` > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_unmap_memory(addr, bytes); > MemTracker::record_some_operation(addr, bytes); > ``` My thinking is that after you done (2) above, then you will not need to expose the NMT lock to this level. The code would be: MemTracker::record_some_operation(addr, bytes); // Lock confined inside this pd_unmap_memory(addr, bytes); So, I would wait with this cleanup until we know more about (2). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2769766908 From stefank at openjdk.org Tue Apr 1 15:35:46 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 1 Apr 2025 15:35:46 GMT Subject: RFR: 8353329: Small memory leak when create GrowableArray with initial size 0 [v2] In-Reply-To: References: <8AYi7nNaVzabbCR4r8fRTK7_rPXo4JgVUI6dKOXaxiQ=.289463ca-a4c5-4966-b14b-1039ad51cc32@github.com> Message-ID: On Tue, 1 Apr 2025 15:19:28 GMT, Zhengyu Gu wrote: >> Please review this small fix to avoid 1 byte leak when create a GrowableArray with initial size = 0. >> >> GrowableArray's c-heap allocator does not check size = 0 and os:malloc(0) will malloc at least 1 byte. > > Zhengyu Gu has updated the pull request incrementally with two additional commits since the last revision: > > - Copyright year > - stefank's comment Changes requested by stefank (Reviewer). src/hotspot/share/utilities/growableArray.cpp line 48: > 46: if (max == 0) { > 47: return nullptr; > 48: } I think you missed my request to add blank lines around this block: Suggestion: if (max == 0) { return nullptr; } ------------- PR Review: https://git.openjdk.org/jdk/pull/24341#pullrequestreview-2733136744 PR Review Comment: https://git.openjdk.org/jdk/pull/24341#discussion_r2023105688 From dnsimon at openjdk.org Tue Apr 1 15:39:18 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 1 Apr 2025 15:39:18 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v6] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Tue, 1 Apr 2025 09:25:17 GMT, Kim Barrett wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> convert Windows path to Unix path > > test/hotspot/jtreg/TEST.groups line 142: > >> 140: >> 141: tier1_common = \ >> 142: sources \ > > I don't understand this change. How does this end up doing anything different than before? This makes `sources` be tested in GHA: https://github.com/openjdk/jdk/blob/a1ab1d8de411aace21decd133e7e74bb97f27897/.github/workflows/test.yml#L88 An alternative would be to add a separate GHA jobs just for `sources`: - test-name: 'hs/tier1 sources' test-suite: 'test/hotspot/jtreg/:tier1_sources' debug-suffix: -debug Given how small `sources` is ([currently only 1 test](https://github.com/openjdk/jdk/tree/master/test/hotspot/jtreg/sources)), it felt like it should just be folded into common. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2023111780 From zgu at openjdk.org Tue Apr 1 15:39:38 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Tue, 1 Apr 2025 15:39:38 GMT Subject: RFR: 8353329: Small memory leak when create GrowableArray with initial size 0 [v3] In-Reply-To: <8AYi7nNaVzabbCR4r8fRTK7_rPXo4JgVUI6dKOXaxiQ=.289463ca-a4c5-4966-b14b-1039ad51cc32@github.com> References: <8AYi7nNaVzabbCR4r8fRTK7_rPXo4JgVUI6dKOXaxiQ=.289463ca-a4c5-4966-b14b-1039ad51cc32@github.com> Message-ID: > Please review this small fix to avoid 1 byte leak when create a GrowableArray with initial size = 0. > > GrowableArray's c-heap allocator does not check size = 0 and os:malloc(0) will malloc at least 1 byte. Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Added empty lines ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24341/files - new: https://git.openjdk.org/jdk/pull/24341/files/ba9df7d6..306eb851 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24341&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24341&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24341.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24341/head:pull/24341 PR: https://git.openjdk.org/jdk/pull/24341 From iklam at openjdk.org Tue Apr 1 15:42:33 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 1 Apr 2025 15:42:33 GMT Subject: RFR: 8352775: JVM crashes with -XX:AOTMode=create -XX:+UseZGC Message-ID: 8352775: JVM crashes with -XX:AOTMode=create -XX:+UseZGC ------------- Commit messages: - Merge branch 'master' into 8352775-aot-crash-with-zgc - 8352775: java -XX:AOTMode=create -XX:+UseZGC crashes Changes: https://git.openjdk.org/jdk/pull/24347/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24347&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352775 Stats: 63 lines in 2 files changed: 62 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24347/head:pull/24347 PR: https://git.openjdk.org/jdk/pull/24347 From iklam at openjdk.org Tue Apr 1 15:45:39 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 1 Apr 2025 15:45:39 GMT Subject: RFR: 8353325: Rewrite appcds/methodHandles test cases to use CDSAppTester [v2] In-Reply-To: References: Message-ID: > These test cases are rewritten to use CDSAppTester, so that they can also be executed in the new JEP 483 workflow (with `-XX:AOTCache=xxx`, etc). This will increase coverage of current and upcoming AOT features (such as AOT linking of invokedynamic, and AOT method profiling). > > These test cases are generated by a bash script. This PR minimizes the generated part so that the main portions of the tests can be modified as a normal java source file. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @calvinccheung comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24340/files - new: https://git.openjdk.org/jdk/pull/24340/files/43c36d65..6a62ae00 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24340&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24340&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24340.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24340/head:pull/24340 PR: https://git.openjdk.org/jdk/pull/24340 From iklam at openjdk.org Tue Apr 1 15:45:39 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 1 Apr 2025 15:45:39 GMT Subject: RFR: 8353325: Rewrite appcds/methodHandles test cases to use CDSAppTester [v2] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 01:28:32 GMT, Calvin Cheung wrote: > Just one nit. Which tiers testing have been run with this change? I ran tiers 1-6. The only problem I encountered was https://github.com/openjdk/jdk/pull/24347 , which I will integrate before integrating this PR. > test/hotspot/jtreg/runtime/cds/appcds/methodHandles/JDKMethodHandlesTestRunner.java line 40: > >> (failed to retrieve contents of file, check the PR for context) > Pre-existing: > Can you also remove the comment `System.out.println` at line 138? Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24340#issuecomment-2769799036 PR Review Comment: https://git.openjdk.org/jdk/pull/24340#discussion_r2023118962 From duke at openjdk.org Tue Apr 1 16:04:26 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Tue, 1 Apr 2025 16:04:26 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: <99erpzWK9d7spPI6eM-QsV1IQhVfnwt-VKO1Oiz19PM=.19c6112f-0538-4784-9fc7-d9d86f4a9f84@github.com> On Tue, 1 Apr 2025 04:38:00 GMT, Ashutosh Mehra wrote: >> OK, I tend to agree; I will investigate alternatives. I did consider putting the `statfs` logic inside but ended up leaving it outside because `determine_type` is called by the `whitebox` framework, and "mocking" `statfs` is not possible with regular files. The idea is to allow the test suite to simply mock the `statfs` result via the boolean `cgroups_v2_enabled` argument. > > One option is to pass an argument to `determine_type` to indicate it is being called from the test suite and skip the call to `statfs` in such case. OK; @ashu-mehra do you want me to try the function pointer approach? I agree with @jerboaa that it will result in more complex code. Practically speaking I would rather keep the `statfs` call in `CgroupSubsystemFactory::create` even though I agree it is logically part of determining the `cgroup` type. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r2023155890 From iklam at openjdk.org Tue Apr 1 16:26:36 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 1 Apr 2025 16:26:36 GMT Subject: RFR: 8352775: JVM crashes with -XX:AOTMode=create -XX:+UseZGC [v2] In-Reply-To: References: Message-ID: > Please review this small fix. When ZGC is enabled, heap dumping is disabled, so we should't call functions that are related to heap dumping. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: No need to use "@requires vm.gc == null" ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24347/files - new: https://git.openjdk.org/jdk/pull/24347/files/4f2a0108..0cdf72ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24347&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24347&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24347/head:pull/24347 PR: https://git.openjdk.org/jdk/pull/24347 From ccheung at openjdk.org Tue Apr 1 16:39:30 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 1 Apr 2025 16:39:30 GMT Subject: RFR: 8352775: JVM crashes with -XX:AOTMode=create -XX:+UseZGC [v2] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:26:36 GMT, Ioi Lam wrote: >> Please review this small fix. When ZGC is enabled, heap dumping is disabled, so we should't call functions that are related to heap dumping. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > No need to use "@requires vm.gc == null" LGTM ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24347#pullrequestreview-2733317364 From gziemski at openjdk.org Tue Apr 1 16:40:36 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 1 Apr 2025 16:40:36 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 08:03:28 GMT, Stefan Karlsson wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> work > > test/hotspot/gtest/runtime/test_committed_virtualmemory.cpp line 94: > >> 92: const size_t page_sz = os::vm_page_size(); >> 93: const size_t size = num_pages * page_sz; >> 94: char* base = os::reserve_memory(size, mtThreadStack, !ExecMem); > > Suggestion: > > char* base = os::reserve_memory(size, mtThreadStack); ExecMem is a parameter, which could be false, so that would make it `true` for `reserve_memory`, so need to keep it? > test/hotspot/gtest/runtime/test_committed_virtualmemory.cpp line 162: > >> 160: const size_t num_pages = 4; >> 161: const size_t size = num_pages * page_sz; >> 162: char* base = os::reserve_memory(size, mtTest, !ExecMem); > > Suggestion: > > char* base = os::reserve_memory(size, mtTest); ExecMem is a parameter, that could be false. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2023206394 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2023207421 From ccheung at openjdk.org Tue Apr 1 16:46:27 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 1 Apr 2025 16:46:27 GMT Subject: RFR: 8353129: CDS ArchiveRelocation tests fail after JDK-8325132 [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 22:55:23 GMT, Ioi Lam wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> simplify the fix per David's suggestion > > LGTM. Thanks @iklam @dholmes-ora @kimbarrett for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24308#issuecomment-2769958919 From ccheung at openjdk.org Tue Apr 1 16:46:28 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 1 Apr 2025 16:46:28 GMT Subject: Integrated: 8353129: CDS ArchiveRelocation tests fail after JDK-8325132 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 02:47:53 GMT, Calvin Cheung wrote: > Two archive relocation tests failed when `-XX:ArchiveRelocationMode=0` is specified via the jtreg `-javaoption`. > A fix is to add a `WhiteBox.getArchiveRelocationMode()` method so that the tests can check if the `ArchiveRelocationMode` is set to 0 before checking the expected output. > > Passed tiers 1 - 4 testing. This pull request has now been integrated. Changeset: 6a46d554 Author: Calvin Cheung URL: https://git.openjdk.org/jdk/commit/6a46d554c7434fd10aade2d2b17d0ad4cad83979 Stats: 7 lines in 2 files changed: 2 ins; 0 del; 5 mod 8353129: CDS ArchiveRelocation tests fail after JDK-8325132 Reviewed-by: iklam, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/24308 From gziemski at openjdk.org Tue Apr 1 16:47:35 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 1 Apr 2025 16:47:35 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v2] In-Reply-To: References: Message-ID: <4FA724K9XEmS9Fyw89-NEkkYHyxGl03Ln-401PP_eOs=.4c782592-ee41-4758-94a1-4b79b8e81481@github.com> On Fri, 28 Mar 2025 08:18:12 GMT, Stefan Karlsson wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> work > > src/hotspot/share/memory/allocation.inline.hpp line 61: > >> 59: size_t size = size_for(length); >> 60: >> 61: char* addr = os::reserve_memory(size, mem_tag, !ExecMem); > > Suggestion: > > char* addr = os::reserve_memory(size, mem_tag); ExecMem can be false? > src/hotspot/share/memory/allocation.inline.hpp line 78: > >> 76: size_t size = size_for(length); >> 77: >> 78: char* addr = os::reserve_memory(size, mem_tag, !ExecMem); > > Suggestion: > > char* addr = os::reserve_memory(size, mem_tag); ExecMem can be false? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2023215520 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2023217448 From gziemski at openjdk.org Tue Apr 1 16:50:24 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 1 Apr 2025 16:50:24 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 19:24:53 GMT, Stefan Karlsson wrote: >> I wasn't sure what that is here, we can do this in a follow up? > > My suggestion was to change `attempt_reserve_memory_between` to take MemTag as an argument. It's OK to do that as a follow-up PR. I decided to do this now, since I had to go through many files anyhow. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2023222075 From fparain at openjdk.org Tue Apr 1 17:37:19 2025 From: fparain at openjdk.org (Frederic Parain) Date: Tue, 1 Apr 2025 17:37:19 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 14:02:03 GMT, Thomas Stuefe wrote: > In preparation for planned GC performance improvements (KLUT), I would like to reduce the average number of oop map entries. > > For details, please see JBS issue text. > > ----------------------- > > Patch results: > > The patch brings a positive change of oop map size, reducing the likelihood of lengthy oop maps. Here the oop map size distribution over all JDK classes in the JDK image: > > Before: > > 5395 - non-static oop maps (0 entries) > 9330 - non-static oop maps (1 entries) > 1449 - non-static oop maps (2 entries) > 274 - non-static oop maps (3 entries) > 218 - non-static oop maps (4 entries) > 75 - non-static oop maps (5 entries) > 7 - non-static oop maps (6 entries) > 4 - non-static oop maps (7 entries) > > > Now: > > 5395 - non-static oop maps (0 entries) > 10178 - non-static oop maps (1 entries) > 933 - non-static oop maps (2 entries) > 229 - non-static oop maps (3 entries) > 16 - non-static oop maps (4 entries) > 1 - non-static oop maps (5 entries) > > > For example, `java.util.concurrent.ConcurrentHashMap$TreeNode` is changed from having 2 entries to having just one entry, which is nice for a class that may be instantiated a lot: > > Before: > > java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000000d1dddc0} > - ---- non-static fields (9 words): > - final 'hash' 'I' @12 > - final 'key' 'Ljava/lang/Object;' @16 > - volatile 'val' 'Ljava/lang/Object;' @20 > - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class > - 'red' 'Z' @28 << derived class starts here, non-oops lead > - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 > - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 > - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 > - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @44 > - non-static oop maps (2 entries): 16-24 32-44 > > Now: > > java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000007e1de450} > - ---- non-static fields (9 words): > - final 'hash' 'I' @12 > - final 'key' 'Ljava/lang/Object;' @16 > - volatile 'val' 'Ljava/lang/Object;' @20 > - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class > - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @28 << class starts here, oops lead > - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 > - 'right' 'Ljava/util/concurrent/Concurre... A possible improvement to this code would be to compute if the super class' layout ends with oops during the reconstruction (reconstruct_layout()), to avoid having to iterate over the fields a second time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24330#issuecomment-2770208150 From asmehra at openjdk.org Tue Apr 1 17:51:22 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 1 Apr 2025 17:51:22 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: <4HUhx0pRJg0clFYNtKBABZW4Ip3GVLZphoHaZwmu8yA=.074ba2f7-6791-43c1-9f7d-f220d6eb4f88@github.com> On Tue, 1 Apr 2025 14:11:31 GMT, Severin Gehwolf wrote: >> This is what I had so far (not yet fully tested), but it adds a null check to the non-testing code path...; I can try an approach with a function pointer too if necessary; let me know how to proceed: >> >> >> diff --git a/src/hotspot/os/linux/cgroupSubsystem_linux.cpp b/src/hotspot/os/linux/cgroupSubsystem_linux.cpp >> index 612cb9a9302..6479853ac04 100644 >> --- a/src/hotspot/os/linux/cgroupSubsystem_linux.cpp >> +++ b/src/hotspot/os/linux/cgroupSubsystem_linux.cpp >> @@ -66,26 +66,10 @@ CgroupSubsystem* CgroupSubsystemFactory::create() { >> CgroupV1Controller* pids = nullptr; >> CgroupInfo cg_infos[CG_INFO_LENGTH]; >> u1 cg_type_flags = INVALID_CGROUPS_GENERIC; >> - const char* proc_cgroups = "/proc/cgroups"; >> - const char* sys_fs_cgroup_cgroup_controllers = "/sys/fs/cgroup/cgroup.controllers"; >> - const char* controllers_file = proc_cgroups; >> const char* proc_self_cgroup = "/proc/self/cgroup"; >> const char* proc_self_mountinfo = "/proc/self/mountinfo"; >> - const char* sys_fs_cgroup = "/sys/fs/cgroup"; >> - struct statfs fsstat = {}; >> - bool cgroups_v2_enabled = false; >> >> - // Assume cgroups v2 is usable by the JDK iff /sys/fs/cgroup has the cgroup v2 >> - // file system magic. If it does not then heuristics are required to determine >> - // if cgroups v1 is usable or not. >> - if (statfs(sys_fs_cgroup, &fsstat) != -1) { >> - cgroups_v2_enabled = (fsstat.f_type == CGROUP2_SUPER_MAGIC); >> - if (cgroups_v2_enabled) { >> - controllers_file = sys_fs_cgroup_cgroup_controllers; >> - } >> - } >> - >> - bool valid_cgroup = determine_type(cg_infos, cgroups_v2_enabled, controllers_file, proc_self_cgroup, proc_self_mountinfo, &cg_type_flags); >> + bool valid_cgroup = determine_type(cg_infos, true, NULL, proc_self_cgroup, proc_self_mountinfo, &cg_type_flags); >> >> if (!valid_cgroup) { >> // Could not detect cgroup type >> @@ -249,9 +233,16 @@ static inline bool match_mount_info_line(char* line, >> tmpcgroups) == 5; >> } >> >> +/* >> + * If controllers_file_mock is non-NULL use it as the controllers file >> + * and respect cgroups_v2_enabled_mock. This is used by WhiteBox to >> + * mock the statfs call. If controllers_file_mock is NULL, ignore >> + * cgroups_v2_enabled_mock and determine using statfs what to use as >> + * the controllers file. >> + */ >> bool CgroupSubsystemFactory::determine_type(CgroupInfo* cg_infos, >> - bool cgroups_v2_enabled, >> - ... > > This doesn't make the code any better to read, IMO. The previous version seems better considering the non-mock code path isn't tested any more/less. @jerboaa @fitzsim I agree this doesn't look good. But I have another suggestion. Given that we now determine if we are in v2 or v1 by using `statfs` before calling `determine_type`, we should IMO rename `determine_type` to be something like `validate_and_populate_cgroup`. In addition to that, I also feel it would be better if the logic in `determine_type` can be broken down into two functions - one for v1 and another for v2. That would simplify the code and make it more readable as well. The tests can also be updated to call the function that corresponds to the cgroup version being tested. And I understand this is quite a bit of refactoring so I don't mind if this is done in a subsequent PR. What do you think? Apart from this I don't have any more comments. Rest looks good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r2023394878 From gziemski at openjdk.org Tue Apr 1 18:02:03 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 1 Apr 2025 18:02:03 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v5] In-Reply-To: References: Message-ID: > This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. > > I tried to fill in tag, when I was pretty certain that I had the right type. > > At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: Stefan's feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24282/files - new: https://git.openjdk.org/jdk/pull/24282/files/40cb4384..f0ccc7f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=03-04 Stats: 30 lines in 13 files changed: 0 ins; 0 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/24282.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24282/head:pull/24282 PR: https://git.openjdk.org/jdk/pull/24282 From stuefe at openjdk.org Tue Apr 1 18:07:17 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 1 Apr 2025 18:07:17 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances In-Reply-To: References: Message-ID: <4qH0gHOjeFiRbFmSzEuq7qGvfU83MoUKNdB6e74NMEY=.4d7e3f48-bee3-49fb-8098-642e34c2f5dd@github.com> On Tue, 1 Apr 2025 17:34:28 GMT, Frederic Parain wrote: > A possible improvement to this code would be to compute if the super class' layout ends with oops during the reconstruction (reconstruct_layout()), to avoid having to iterate over the fields a second time. That is a good idea. A prior version of this stored the last field information after constructing the layout in the InstanceKlass (as a "ends_with_oop" bool) but I did not like that, since I did not wanted to add a new member to IK. But if we reconstruct the layout for the super class anyway, I can do the same there. Thansk! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24330#issuecomment-2770285950 From duke at openjdk.org Tue Apr 1 18:27:28 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Tue, 1 Apr 2025 18:27:28 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: <4HUhx0pRJg0clFYNtKBABZW4Ip3GVLZphoHaZwmu8yA=.074ba2f7-6791-43c1-9f7d-f220d6eb4f88@github.com> References: <4HUhx0pRJg0clFYNtKBABZW4Ip3GVLZphoHaZwmu8yA=.074ba2f7-6791-43c1-9f7d-f220d6eb4f88@github.com> Message-ID: On Tue, 1 Apr 2025 17:48:43 GMT, Ashutosh Mehra wrote: >> This doesn't make the code any better to read, IMO. The previous version seems better considering the non-mock code path isn't tested any more/less. > > @jerboaa @fitzsim I agree this doesn't look good. > But I have another suggestion. Given that we now determine if we are in v2 or v1 by using `statfs` before calling `determine_type`, we should IMO rename `determine_type` to be something like `validate_and_populate_cgroup`. > In addition to that, I also feel it would be better if the logic in `determine_type` can be broken down into two functions - one for v1 and another for v2. That would simplify the code and make it more readable as well. > The tests can also be updated to call the function that corresponds to the cgroup version being tested. > And I understand this is quite a bit of refactoring so I don't mind if this is done in a subsequent PR. > What do you think? > Apart from this I don't have any more comments. Rest looks good. I agree these would be good subsequent changes. I tried some refactoring of `determine_type` into several functions while working on this patch but abandoned the effort to focus on the logic. I think after this patch, splitting `determine_type` will be a little easier (but still tricky -- the `cgroups v1`-versus-no-relevant-`cgroups` logic "wants" to be one long function, I suspect). I will leave the subsequent PR up to @jerboaa if he wants to file a bug and assign it to me. Now I will check if `master` needs re-merging, and push the test case change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r2023471782 From vpaprotski at openjdk.org Tue Apr 1 18:47:39 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 1 Apr 2025 18:47:39 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v11] In-Reply-To: References: Message-ID: On Sat, 22 Mar 2025 20:02:31 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with two additional commits since the last revision: > > - Further readability improvements. > - Added asserts for array sizes src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 342: > 340: // Performs two keccak() computations in parallel. The steps of the > 341: // two computations are executed interleaved. > 342: static address generate_double_keccak(StubGenerator *stubgen, MacroAssembler *_masm) { This function seems ok. I didnt do as line-by-line 'exact' review as for the NTT intrinsics, but just put the new version into a diff next to the original function. Seems like a reasonable clean 'refactor' (hardcode the blocksize, add new input registers 10-14. Makes it really easy to spot vs 0-4 original registers..) I didnt realize before that the 'top 3 limbs' are wasted. I guess it doesnt matter, there are registers to spare aplenty and it makes the entire algorithm cleaner and easier to follow. I did also stare at the algorithm with the 'What about AVX2' question.. This function would pretty much need to be rewritten it looks like :/ Last two questions.. - how much performance is gained from doubling this function up? - If thats worth it.. what if instead it was quadrupled the input? (I scanned the java code, it looked like NR was parametrized already to 2..). It looks like there are almost enough registers here to go to 4 (I think 3 would need to be freed up somehow.. alternatively, the upper 3 limbs are empty in all operations, perhaps it could be used instead.. at the expense of readability) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2017636762 From vpaprotski at openjdk.org Tue Apr 1 18:47:39 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 1 Apr 2025 18:47:39 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v12] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 14:40:56 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Reacting to comments by Volodymyr. No further comments from me. (I did leave two questions, but nothing that requires code changes) Thanks for addressing all my many (lengthy) comments and questions. And the refactor! ------------- Marked as reviewed by vpaprotski (Author). PR Review: https://git.openjdk.org/jdk/pull/23860#pullrequestreview-2723440925 From asmehra at openjdk.org Tue Apr 1 19:01:45 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 1 Apr 2025 19:01:45 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: <3nrkDF2OyHFH0VDluf2vEm-oc_ENZxPyTCxCRxqTnjs=.5ccf0613-b212-4005-9dc3-1ab8b96f970b@github.com> On Wed, 5 Mar 2025 17:45:26 GMT, Thomas Fitzsimmons wrote: >> This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. >> >> I tested it with: >> >> >> java -Xlog:os+container=trace -version >> >> on: >> >> `Red Hat Enterprise Linux 8 (cgroups v1 only)`: >> _No change in behaviour_ >> >> `Fedora 41 (cgroups v2)`: >> _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ >> >> --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 >> +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 >> @@ -1,7 +1,12 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> +[debug][os,container] v2 controller cpuset is enabled and relevant >> +[debug][os,container] v2 controller cpu is enabled and required >> +[debug][os,container] v2 controller io is enabled but not relevant >> +[debug][os,container] v2 controller memory is enabled and required >> +[debug][os,container] v2 controller hugetlb is enabled but not relevant >> +[debug][os,container] v2 controller pids is enabled and relevant >> +[debug][os,container] v2 controller rdma is enabled but not relevant >> +[debug][os,container] v2 controller misc is enabled but not relevant >> [debug][os,container] Detected cgroups v2 unified hierarchy >> [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope >> [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max >> >> >> `Fedora 41 (custom kernel with cgroups v1 disabled)`: >> _Fixes `cgroups v2` detection:_ >> >> --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 >> +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 >> @@ -1,7 +1,63 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> -[debug][os,container] controller memory is not enabled >> - ] >> -[debug][os,container] One or more required controllers disabled at kernel level. >> +[... > > Thomas Fitzsimmons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 > - Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent > - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing > > Remove from cgroups v1 branch incorrect log messages about cpuset > controller being optional. Add test case for cgroups v1, cpuset > disabled. > - Improve !cgroups_v2_enabled branch comment > - Debug-log optional and disabled cgroups v2 controllers > > Do not log enabled controllers that are not relevant to the JDK. > - Move index declaration to scope in which it is used > - Remove empty string check during cgroup.controllers parsing > - Define ISSPACE_CHARS macro, use it in strsep call > - Pass fgets result to strsep > - Replace is_cgroupsV2 with cgroups_v2_enabled > > Also fix the testCgroupv1SystemdOnly and testCgroupv1NoMounts test > cases such that their /proc/cgroups and /proc/self/cgroup contents > correspond. This prevents assertion failures these tests were > producing when is_cgroupsV2 was replaced with cgroups_v2_enabled. > - ... and 3 more: https://git.openjdk.org/jdk/compare/d8205024...b6926e15 Marked as reviewed by asmehra (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23811#pullrequestreview-2733785528 From matsaave at openjdk.org Tue Apr 1 19:51:29 2025 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 1 Apr 2025 19:51:29 GMT Subject: RFR: 8352437: Support --add-exports with -XX:+AOTClassLinking [v3] In-Reply-To: References: Message-ID: <8u8iBz0SurEb2I01k0OxYN11YgY_1b-RJP9Xu2aw1ss=.9ef1958c-00b1-4290-9dc7-af9ab1b3d6c8@github.com> On Thu, 27 Mar 2025 22:13:02 GMT, Ioi Lam wrote: >> `-XX:+AOTClassLinking` requires the CDS archived full module graph (FMG). >> >> - Before this PR, when `--add-export` is specified, FMG is disabled, so AOT caches created with `-XX:+AOTClassLinking` cannot be loaded. >> - After this PR, if the exact same `--add-export` flags as specified across the training/assembly/production phases, the FMG can be used, so we can use so AOT caches created with `-XX:+AOTClassLinking`. >> >> The change itself is straight-forward: just remember the `--add-export` flags specified during AOT cache creation, and check the exact same ones are used during the production run. >> >> I did a fair amount of refactoring to change the "exact options specified" checks in modules.cpp, so more such options can be easily added in the future (we need to handle `--add-reads` and `--add-opens` in future RFEs). >> >> (Note: this PR depends on #24122 ) > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - @calvinccheung comments > - Merge branch 'master' into 8352437-aot-class-linking-incompatible-with-add-exports > - Fixed whitespaces > - clean up > - 8352437: -XX:+AOTClassLinking is not compatible with --add-export > - added comments > - added comments > - Prototype: support --add-exports in CDS FMG Changes and cleanup look good! Thanks! ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24124#pullrequestreview-2733895429 From vlivanov at openjdk.org Tue Apr 1 19:59:20 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Apr 2025 19:59:20 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 [v2] In-Reply-To: References: Message-ID: > Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. > > It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. > > PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. > > Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. > > Testing: hs-tier1 - hs-tier4, microbenchmarks Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: - Build changes - share/native -> unix/native ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24306/files - new: https://git.openjdk.org/jdk/pull/24306/files/108cc0c3..90312182 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24306&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24306&range=00-01 Stats: 67 lines in 175 files changed: 14 ins; 32 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/24306.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24306/head:pull/24306 PR: https://git.openjdk.org/jdk/pull/24306 From vlivanov at openjdk.org Tue Apr 1 20:03:43 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Apr 2025 20:03:43 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 [v2] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 19:59:20 GMT, Vladimir Ivanov wrote: >> Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. >> >> It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. >> >> PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. >> >> Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. >> >> Testing: hs-tier1 - hs-tier4, microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: > > - Build changes > - share/native -> unix/native Thanks, Magnus. I incorporated your patches into the PR. The library code is now located under `src/jdk.incubator.vector/unix/native/libsleef`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2770548994 From gziemski at openjdk.org Tue Apr 1 20:36:21 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 1 Apr 2025 20:36:21 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 08:25:08 GMT, Stefan Karlsson wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> work > > I went over the patch and added suggestions for places where I think you're using the wrong tag, or where I think it is obvious that there's a better tag than mtNone. I've also suggested removal of the now redundant 'executable' argument, which I want to see as little of as possible given that it is a wart on the memory reservation APIs (IMHO). @stefank After changing various `mtNone` tags to `mtTest` we know are seeing build failures: # Internal Error (/Users/runner/work/jdk/jdk/src/hotspot/share/nmt/virtualMemoryTracker.cpp:427), pid=79835, tid=8707 # assert(reserved_rgn->mem_tag() == mtNone) failed: Overwrite memory tag (should be mtNone, is: "Test") I would like to revert to the last passing commit [40cb438](https://github.com/openjdk/jdk/pull/24282/commits/40cb4384f17ad63c1374ea785f76e415a44eb426) and re-do this step in a follow-up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24282#issuecomment-2770608945 From gziemski at openjdk.org Tue Apr 1 20:40:36 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 1 Apr 2025 20:40:36 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v6] In-Reply-To: References: Message-ID: > This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. > > I tried to fill in tag, when I was pretty certain that I had the right type. > > At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: Revert "Stefan's feedback" This reverts commit f0ccc7f79e5a821cf632d1e0e898dad3254a2f0b. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24282/files - new: https://git.openjdk.org/jdk/pull/24282/files/f0ccc7f7..c0181d35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=04-05 Stats: 30 lines in 13 files changed: 0 ins; 0 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/24282.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24282/head:pull/24282 PR: https://git.openjdk.org/jdk/pull/24282 From gziemski at openjdk.org Tue Apr 1 20:53:06 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 1 Apr 2025 20:53:06 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v7] In-Reply-To: References: Message-ID: > This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. > > I tried to fill in tag, when I was pretty certain that I had the right type. > > At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: remove default value parameter if it's false from os::reserve_memory ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24282/files - new: https://git.openjdk.org/jdk/pull/24282/files/c0181d35..5de1d560 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=05-06 Stats: 13 lines in 9 files changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/24282.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24282/head:pull/24282 PR: https://git.openjdk.org/jdk/pull/24282 From matsaave at openjdk.org Tue Apr 1 20:58:26 2025 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 1 Apr 2025 20:58:26 GMT Subject: RFR: 8352775: JVM crashes with -XX:AOTMode=create -XX:+UseZGC [v2] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:26:36 GMT, Ioi Lam wrote: >> Please review this small fix. When ZGC is enabled, heap dumping is disabled, so we should't call functions that are related to heap dumping. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > No need to use "@requires vm.gc == null" LGTM, thanks! ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24347#pullrequestreview-2734027891 From gziemski at openjdk.org Tue Apr 1 21:02:01 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 1 Apr 2025 21:02:01 GMT Subject: RFR: 8344883: Do not use mtNone if we know the tag type [v7] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 20:53:06 GMT, Gerard Ziemski wrote: >> This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. >> >> I tried to fill in tag, when I was pretty certain that I had the right type. >> >> At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > remove default value parameter if it's false from os::reserve_memory Filed [Skip default value parameter if it's false from os::reserve_memory_special and os::attempt_reserve_memory_at()](https://bugs.openjdk.org/browse/JDK-8353477) to address the other APIs using false for the exec parameter with default value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24282#issuecomment-2770670565 From ihse at openjdk.org Tue Apr 1 21:09:17 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 1 Apr 2025 21:09:17 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 [v2] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 19:59:20 GMT, Vladimir Ivanov wrote: >> Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. >> >> It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. >> >> PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. >> >> Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. >> >> Testing: hs-tier1 - hs-tier4, microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: > > - Build changes > - share/native -> unix/native LGTM now. Thanks! I sauggest awaiting another build group reviewer since I wrote some of this code. ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24306#pullrequestreview-2734048599 From gziemski at openjdk.org Tue Apr 1 21:11:16 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 1 Apr 2025 21:11:16 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v7] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 20:53:06 GMT, Gerard Ziemski wrote: >> This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. >> >> I tried to fill in tag, when I was pretty certain that I had the right type. >> >> At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > remove default value parameter if it's false from os::reserve_memory Filed [Replace mtNone with actual values, when known.](https://bugs.openjdk.org/browse/JDK-8353480) to provide real values, but we will need to proceed with caution. It is not as simple as changing all mtNone to mtTest for example (even in gtests) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24282#issuecomment-2770691388 From duke at openjdk.org Tue Apr 1 21:33:30 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Tue, 1 Apr 2025 21:33:30 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v5] In-Reply-To: References: Message-ID: > This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. > > I tested it with: > > > java -Xlog:os+container=trace -version > > on: > > `Red Hat Enterprise Linux 8 (cgroups v1 only)`: > _No change in behaviour_ > > `Fedora 41 (cgroups v2)`: > _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ > > --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 > +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 > @@ -1,7 +1,12 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 controller cpu is enabled and required > +[debug][os,container] v2 controller io is enabled but not relevant > +[debug][os,container] v2 controller memory is enabled and required > +[debug][os,container] v2 controller hugetlb is enabled but not relevant > +[debug][os,container] v2 controller pids is enabled and relevant > +[debug][os,container] v2 controller rdma is enabled but not relevant > +[debug][os,container] v2 controller misc is enabled but not relevant > [debug][os,container] Detected cgroups v2 unified hierarchy > [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope > [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max > > > `Fedora 41 (custom kernel with cgroups v1 disabled)`: > _Fixes `cgroups v2` detection:_ > > --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 > +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 > @@ -1,7 +1,63 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > -[debug][os,container] controller memory is not enabled > - ] > -[debug][os,container] One or more required controllers disabled at kernel level. > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 contro... Thomas Fitzsimmons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 - testCgroupv1SystemdOnly, testCgroupv1NoMounts: Use cgroupv1 fields - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 - Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing Remove from cgroups v1 branch incorrect log messages about cpuset controller being optional. Add test case for cgroups v1, cpuset disabled. - Improve !cgroups_v2_enabled branch comment - Debug-log optional and disabled cgroups v2 controllers Do not log enabled controllers that are not relevant to the JDK. - Move index declaration to scope in which it is used - Remove empty string check during cgroup.controllers parsing - Define ISSPACE_CHARS macro, use it in strsep call - ... and 5 more: https://git.openjdk.org/jdk/compare/24e617d4...b29d8694 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23811/files - new: https://git.openjdk.org/jdk/pull/23811/files/b6926e15..b29d8694 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23811&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23811&range=03-04 Stats: 128967 lines in 2886 files changed: 51904 ins; 59351 del; 17712 mod Patch: https://git.openjdk.org/jdk/pull/23811.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23811/head:pull/23811 PR: https://git.openjdk.org/jdk/pull/23811 From bchristi at openjdk.org Tue Apr 1 22:04:09 2025 From: bchristi at openjdk.org (Brent Christian) Date: Tue, 1 Apr 2025 22:04:09 GMT Subject: RFR: 8352565: Add native method implementation of Reference.get() [v2] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 09:43:28 GMT, Kim Barrett wrote: >> Please review this change which adds a native method providing the >> implementation of Reference::get. Referece::get is an intrinsic candidate, so >> this native method implementation is only used when the intrinsic is not. >> >> Currently there is intrinsic support by the interpreter, C1, C2, and graal, >> which are always used. With this change we can later remove all the >> per-platform interpreter intrinsic implementations, and might also remove the >> C1 intrinsic implementation. >> >> Testing: >> (1) mach5 tier1-6 normal (so using all the existing intrinsics). >> (2) mach5 tier1-6 with interpreter and C1 Reference::get intrinsics disabled. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > parameterized return type of native get0 test/hotspot/jtreg/gc/TestNativeReferenceGet.java line 162: > 160: System.out.println("Testing nonconcurrent GC"); > 161: clearReferents(); > 162: strengthenReferents(); Might the GC clear refs between `clearReferents()` and `strengthenReferents()`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24315#discussion_r2023735808 From iklam at openjdk.org Tue Apr 1 22:04:15 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 1 Apr 2025 22:04:15 GMT Subject: RFR: 8352775: JVM crashes with -XX:AOTMode=create -XX:+UseZGC [v2] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:36:27 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> No need to use "@requires vm.gc == null" > > LGTM Thanks @calvinccheung and @matias9927 for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24347#issuecomment-2770782400 From iklam at openjdk.org Tue Apr 1 22:04:16 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 1 Apr 2025 22:04:16 GMT Subject: Integrated: 8352775: JVM crashes with -XX:AOTMode=create -XX:+UseZGC In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 06:04:04 GMT, Ioi Lam wrote: > Please review this small fix. When ZGC is enabled, heap dumping is disabled, so we should't call functions that are related to heap dumping. This pull request has now been integrated. Changeset: 6970cf6a Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/6970cf6ac69864e7027138746361e7da1983c24d Stats: 62 lines in 2 files changed: 61 ins; 0 del; 1 mod 8352775: JVM crashes with -XX:AOTMode=create -XX:+UseZGC Reviewed-by: ccheung, matsaave ------------- PR: https://git.openjdk.org/jdk/pull/24347 From erikj at openjdk.org Tue Apr 1 22:14:22 2025 From: erikj at openjdk.org (Erik Joelsson) Date: Tue, 1 Apr 2025 22:14:22 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 [v2] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 19:59:20 GMT, Vladimir Ivanov wrote: >> Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. >> >> It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. >> >> PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. >> >> Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. >> >> Testing: hs-tier1 - hs-tier4, microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: > > - Build changes > - share/native -> unix/native src/jdk.incubator.vector/unix/native/libsleef/README.md line 13: > 11: > 12: The upstream original source code is available in > 13: `src/jdk.incubator.vector/share/native/libsleef/upstream`. However, this code is share -> unix ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24306#discussion_r2023742031 From iklam at openjdk.org Tue Apr 1 22:16:51 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 1 Apr 2025 22:16:51 GMT Subject: RFR: 8353325: Rewrite appcds/methodHandles test cases to use CDSAppTester [v3] In-Reply-To: References: Message-ID: > These test cases are rewritten to use CDSAppTester, so that they can also be executed in the new JEP 483 workflow (with `-XX:AOTCache=xxx`, etc). This will increase coverage of current and upcoming AOT features (such as AOT linking of invokedynamic, and AOT method profiling). > > These test cases are generated by a bash script. This PR minimizes the generated part so that the main portions of the tests can be modified as a normal java source file. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into 8353325-rewrite-cds-methodhandles-tests-using-cdsapptester - @calvinccheung comments - step 3: added support for dynamic and aot workflows - step 2: updated all tests to use STATIC workflow - step 1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24340/files - new: https://git.openjdk.org/jdk/pull/24340/files/6a62ae00..3e1faf4b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24340&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24340&range=01-02 Stats: 12395 lines in 218 files changed: 8327 ins; 3196 del; 872 mod Patch: https://git.openjdk.org/jdk/pull/24340.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24340/head:pull/24340 PR: https://git.openjdk.org/jdk/pull/24340 From ccheung at openjdk.org Tue Apr 1 22:21:16 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 1 Apr 2025 22:21:16 GMT Subject: RFR: 8353325: Rewrite appcds/methodHandles test cases to use CDSAppTester [v3] In-Reply-To: References: Message-ID: <5K68ZVWt-Svkw9vjC9Bsb2YeNNn8cZcaSGqvlHvrpKQ=.3fb33ba6-5369-46eb-92bc-3220cc8db185@github.com> On Tue, 1 Apr 2025 22:16:51 GMT, Ioi Lam wrote: >> These test cases are rewritten to use CDSAppTester, so that they can also be executed in the new JEP 483 workflow (with `-XX:AOTCache=xxx`, etc). This will increase coverage of current and upcoming AOT features (such as AOT linking of invokedynamic, and AOT method profiling). >> >> These test cases are generated by a bash script. This PR minimizes the generated part so that the main portions of the tests can be modified as a normal java source file. > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into 8353325-rewrite-cds-methodhandles-tests-using-cdsapptester > - @calvinccheung comments > - step 3: added support for dynamic and aot workflows > - step 2: updated all tests to use STATIC workflow > - step 1 Marked as reviewed by ccheung (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24340#pullrequestreview-2734160625 From vlivanov at openjdk.org Tue Apr 1 22:45:09 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Apr 2025 22:45:09 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 [v3] In-Reply-To: References: Message-ID: > Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. > > It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. > > PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. > > Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. > > Testing: hs-tier1 - hs-tier4, microbenchmarks Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Adjust README.md ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24306/files - new: https://git.openjdk.org/jdk/pull/24306/files/90312182..8ac5d0bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24306&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24306&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24306.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24306/head:pull/24306 PR: https://git.openjdk.org/jdk/pull/24306 From vlivanov at openjdk.org Tue Apr 1 22:45:11 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Apr 2025 22:45:11 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 [v2] In-Reply-To: References: Message-ID: <0rbzz5vo1TGUjZlYXaBYT-bMUBoZn7no8fQrRtfCcXg=.2c61c27d-3957-4e9d-80b9-51a98c6ff275@github.com> On Tue, 1 Apr 2025 22:09:33 GMT, Erik Joelsson wrote: >> Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: >> >> - Build changes >> - share/native -> unix/native > > src/jdk.incubator.vector/unix/native/libsleef/README.md line 13: > >> 11: >> 12: The upstream original source code is available in >> 13: `src/jdk.incubator.vector/share/native/libsleef/upstream`. However, this code is > > share -> unix Good catch. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24306#discussion_r2023765350 From erikj at openjdk.org Tue Apr 1 23:04:11 2025 From: erikj at openjdk.org (Erik Joelsson) Date: Tue, 1 Apr 2025 23:04:11 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 [v3] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 22:45:09 GMT, Vladimir Ivanov wrote: >> Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. >> >> It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. >> >> PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. >> >> Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. >> >> Testing: hs-tier1 - hs-tier4, microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Adjust README.md Marked as reviewed by erikj (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24306#pullrequestreview-2734208521 From sviswanathan at openjdk.org Tue Apr 1 23:11:45 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Apr 2025 23:11:45 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v12] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 14:40:56 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Reacting to comments by Volodymyr. src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 359: > 357: __ kmovbl(k4, rax); > 358: __ addl(rax, 16); > 359: __ kmovbl(k5, rax); We could use the sequence from generate_sha3_implCompress to setup the K registers, that has less dependency: __ movl(rax, 0x1F); __ kmovbl(k5, rax); __ kshiftrbl(k4, k5, 1); __ kshiftrbl(k3, k5, 2); __ kshiftrbl(k2, k5, 3); __ kshiftrbl(k1, k5, 4); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2023769620 From iklam at openjdk.org Wed Apr 2 01:40:39 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 2 Apr 2025 01:40:39 GMT Subject: RFR: 8352437: Support --add-exports with -XX:+AOTClassLinking [v2] In-Reply-To: References: Message-ID: <8-mYgx6Rbq09iYzRPnATDTUTGgVuL98ATMsQCqAeW-4=.d900b7ae-0fdd-4432-84f9-748b0eeb825d@github.com> On Tue, 25 Mar 2025 04:15:11 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. > > Code changes look clean. I just have two minor comments on the tests. Thanks @calvinccheung and @matias9927 for the review Passed tiers1-4 and build-tiers5. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24124#issuecomment-2771040013 From iklam at openjdk.org Wed Apr 2 01:40:40 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 2 Apr 2025 01:40:40 GMT Subject: Integrated: 8352437: Support --add-exports with -XX:+AOTClassLinking In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 04:46:21 GMT, Ioi Lam wrote: > `-XX:+AOTClassLinking` requires the CDS archived full module graph (FMG). > > - Before this PR, when `--add-export` is specified, FMG is disabled, so AOT caches created with `-XX:+AOTClassLinking` cannot be loaded. > - After this PR, if the exact same `--add-export` flags as specified across the training/assembly/production phases, the FMG can be used, so we can use so AOT caches created with `-XX:+AOTClassLinking`. > > The change itself is straight-forward: just remember the `--add-export` flags specified during AOT cache creation, and check the exact same ones are used during the production run. > > I did a fair amount of refactoring to change the "exact options specified" checks in modules.cpp, so more such options can be easily added in the future (we need to handle `--add-reads` and `--add-opens` in future RFEs). > > (Note: this PR depends on #24122 ) This pull request has now been integrated. Changeset: 096e70de Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/096e70de2d3009040d7ce30f3766167f43de4a96 Stats: 580 lines in 15 files changed: 463 ins; 64 del; 53 mod 8352437: Support --add-exports with -XX:+AOTClassLinking Reviewed-by: matsaave ------------- PR: https://git.openjdk.org/jdk/pull/24124 From dholmes at openjdk.org Wed Apr 2 02:14:31 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 2 Apr 2025 02:14:31 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 15:11:06 GMT, Matthias Baesken wrote: >> The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. >> SOURCE=".:git:21af8c7e7405" >> Also the MODULES list is probably useful to have. >> Add this info (or the complete content of the release file) to the hs_err files. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > address Windows issues Reading from /proc and reading a file from disk are a bit different I think. > Could we maybe load the release file with a bit of delay to avoid even a small hit on startup performance? That may be possible: a one-of "periodic task" for the Watcher thread. If it is not loaded by the time we crash (ie when crashing very early) then I don't think its absence will be missed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2771106999 From iklam at openjdk.org Wed Apr 2 03:49:05 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 2 Apr 2025 03:49:05 GMT Subject: RFR: 8353325: Rewrite appcds/methodHandles test cases to use CDSAppTester [v4] In-Reply-To: References: Message-ID: > These test cases are rewritten to use CDSAppTester, so that they can also be executed in the new JEP 483 workflow (with `-XX:AOTCache=xxx`, etc). This will increase coverage of current and upcoming AOT features (such as AOT linking of invokedynamic, and AOT method profiling). > > These test cases are generated by a bash script. This PR minimizes the generated part so that the main portions of the tests can be modified as a normal java source file. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Fixed failure with ZGC + AOT test case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24340/files - new: https://git.openjdk.org/jdk/pull/24340/files/3e1faf4b..6c368b09 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24340&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24340&range=02-03 Stats: 29 lines in 7 files changed: 29 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24340.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24340/head:pull/24340 PR: https://git.openjdk.org/jdk/pull/24340 From asmehra at openjdk.org Wed Apr 2 04:37:24 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 2 Apr 2025 04:37:24 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v5] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 21:33:30 GMT, Thomas Fitzsimmons wrote: >> This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. >> >> I tested it with: >> >> >> java -Xlog:os+container=trace -version >> >> on: >> >> `Red Hat Enterprise Linux 8 (cgroups v1 only)`: >> _No change in behaviour_ >> >> `Fedora 41 (cgroups v2)`: >> _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ >> >> --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 >> +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 >> @@ -1,7 +1,12 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> +[debug][os,container] v2 controller cpuset is enabled and relevant >> +[debug][os,container] v2 controller cpu is enabled and required >> +[debug][os,container] v2 controller io is enabled but not relevant >> +[debug][os,container] v2 controller memory is enabled and required >> +[debug][os,container] v2 controller hugetlb is enabled but not relevant >> +[debug][os,container] v2 controller pids is enabled and relevant >> +[debug][os,container] v2 controller rdma is enabled but not relevant >> +[debug][os,container] v2 controller misc is enabled but not relevant >> [debug][os,container] Detected cgroups v2 unified hierarchy >> [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope >> [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max >> >> >> `Fedora 41 (custom kernel with cgroups v1 disabled)`: >> _Fixes `cgroups v2` detection:_ >> >> --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 >> +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 >> @@ -1,7 +1,63 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> -[debug][os,container] controller memory is not enabled >> - ] >> -[debug][os,container] One or more required controllers disabled at kernel level. >> +[... > > Thomas Fitzsimmons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 > - testCgroupv1SystemdOnly, testCgroupv1NoMounts: Use cgroupv1 fields > - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 > - Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent > - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing > > Remove from cgroups v1 branch incorrect log messages about cpuset > controller being optional. Add test case for cgroups v1, cpuset > disabled. > - Improve !cgroups_v2_enabled branch comment > - Debug-log optional and disabled cgroups v2 controllers > > Do not log enabled controllers that are not relevant to the JDK. > - Move index declaration to scope in which it is used > - Remove empty string check during cgroup.controllers parsing > - Define ISSPACE_CHARS macro, use it in strsep call > - ... and 5 more: https://git.openjdk.org/jdk/compare/6ca4ef5e...b29d8694 lgtm ------------- Marked as reviewed by asmehra (Committer). PR Review: https://git.openjdk.org/jdk/pull/23811#pullrequestreview-2734634468 From duke at openjdk.org Wed Apr 2 06:32:10 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Apr 2025 06:32:10 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References: Message-ID: <57-zPqw_-3qY6G5TZUYXG4MFzx_jmhHRDN78DR-dy0o=.c105c4e4-9ffa-4dd4-9390-70f27e48f217@github.com> On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument Thank y'all for the thorough review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24248#issuecomment-2771461584 From duke at openjdk.org Wed Apr 2 06:32:11 2025 From: duke at openjdk.org (duke) Date: Wed, 2 Apr 2025 06:32:11 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument @mhaessig Your change (at version 1561a0eea3b2049e4e9e6468d0237f60e97cd2e8) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24248#issuecomment-2771462472 From duke at openjdk.org Wed Apr 2 06:51:28 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Apr 2025 06:51:28 GMT Subject: Integrated: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 09:27:59 GMT, Manuel H?ssig wrote: > # Issue Summary > > When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. > > # Change Summary > > Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. > > Concretel, this PR > - adds parse predicate nodes to the IR testing framework, > - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, > - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, > - adds a regression test. > > > # Testing > > The changes passed the following testing: > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) > - tier1 through tier3 and Oracle internal testing This pull request has now been integrated. Changeset: d358f5f4 Author: Manuel H?ssig Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/d358f5f4a44aacf2d79ccdb3e362ce8ed571f6da Stats: 150 lines in 7 files changed: 128 ins; 2 del; 20 mod 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/24248 From stefank at openjdk.org Wed Apr 2 07:26:25 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 07:26:25 GMT Subject: RFR: 8353329: Small memory leak when create GrowableArray with initial size 0 [v3] In-Reply-To: References: <8AYi7nNaVzabbCR4r8fRTK7_rPXo4JgVUI6dKOXaxiQ=.289463ca-a4c5-4966-b14b-1039ad51cc32@github.com> Message-ID: On Tue, 1 Apr 2025 15:39:38 GMT, Zhengyu Gu wrote: >> Please review this small fix to avoid 1 byte leak when create a GrowableArray with initial size = 0. >> >> GrowableArray's c-heap allocator does not check size = 0 and os:malloc(0) will malloc at least 1 byte. > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Added empty lines Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24341#pullrequestreview-2734897701 From jsjolen at openjdk.org Wed Apr 2 07:34:14 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 2 Apr 2025 07:34:14 GMT Subject: RFR: 8353329: Small memory leak when create GrowableArray with initial size 0 [v3] In-Reply-To: References: <8AYi7nNaVzabbCR4r8fRTK7_rPXo4JgVUI6dKOXaxiQ=.289463ca-a4c5-4966-b14b-1039ad51cc32@github.com> Message-ID: On Tue, 1 Apr 2025 15:39:38 GMT, Zhengyu Gu wrote: >> Please review this small fix to avoid 1 byte leak when create a GrowableArray with initial size = 0. >> >> GrowableArray's c-heap allocator does not check size = 0 and os:malloc(0) will malloc at least 1 byte. > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Added empty lines Marked as reviewed by jsjolen (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24341#pullrequestreview-2734925020 From duke at openjdk.org Wed Apr 2 07:38:34 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 2 Apr 2025 07:38:34 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v13] In-Reply-To: References: Message-ID: > By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Reacting to comment by Sandhya. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23860/files - new: https://git.openjdk.org/jdk/pull/23860/files/7a9f6645..e4ab10bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=11-12 Stats: 10 lines in 1 file changed: 0 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23860.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23860/head:pull/23860 PR: https://git.openjdk.org/jdk/pull/23860 From stefank at openjdk.org Wed Apr 2 07:39:23 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 07:39:23 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v7] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 20:53:06 GMT, Gerard Ziemski wrote: >> This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. >> >> I tried to fill in tag, when I was pretty certain that I had the right type. >> >> At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > remove default value parameter if it's false from os::reserve_memory You keep leaving comments that ExecMem is a parameter. That doesn't make sense. ExecMem is a constant. ------------- PR Review: https://git.openjdk.org/jdk/pull/24282#pullrequestreview-2734939840 From duke at openjdk.org Wed Apr 2 07:45:14 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 2 Apr 2025 07:45:14 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v12] In-Reply-To: References: Message-ID: <_3aVrAsKu82hHiEvG-gkLScqZrm-7M6nDo6vcA7EHds=.19728142-3151-462d-95ea-bdbc36c236a7@github.com> On Tue, 1 Apr 2025 22:43:36 GMT, Sandhya Viswanathan wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Reacting to comments by Volodymyr. > > src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 359: > >> 357: __ kmovbl(k4, rax); >> 358: __ addl(rax, 16); >> 359: __ kmovbl(k5, rax); > > We could use the sequence from generate_sha3_implCompress to setup the K registers, that has less dependency: > > __ movl(rax, 0x1F); > __ kmovbl(k5, rax); > __ kshiftrbl(k4, k5, 1); > __ kshiftrbl(k3, k5, 2); > __ kshiftrbl(k2, k5, 3); > __ kshiftrbl(k1, k5, 4); Thanks! (I had copied/doubled this function from the single state version before you made me do this change on that one and I forgot to update the copy :-) ) Changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2024255339 From stefank at openjdk.org Wed Apr 2 07:53:22 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 07:53:22 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:36:56 GMT, Stefan Karlsson wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> remove default value parameter if it's false from os::reserve_memory > > You keep leaving comments that ExecMem is a parameter. That doesn't make sense. ExecMem is a constant. > @stefank After changing various `mtNone` tags to `mtTest` we now are seeing build failures: > > ``` > # Internal Error (/Users/runner/work/jdk/jdk/src/hotspot/share/nmt/virtualMemoryTracker.cpp:427), pid=79835, tid=8707 > # assert(reserved_rgn->mem_tag() == mtNone) failed: Overwrite memory tag (should be mtNone, is: "Test") > ``` > > I reverted to the last passing commit [40cb438](https://github.com/openjdk/jdk/pull/24282/commits/40cb4384f17ad63c1374ea785f76e415a44eb426) and re-do this step in a follow-up. The problem was that you didn't do what I suggested. I suggested that you changed one occurrence to be mtMetaspace: https://github.com/openjdk/jdk/pull/24282#discussion_r2018124074 But you changed it to be mtTest: https://github.com/openjdk/jdk/pull/24282/commits/f0ccc7f79e5a821cf632d1e0e898dad3254a2f0b and this causes the crash above. Do you know that you can just accept the proposed changes in the GitHub UI? I think that would prevent bugs like this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24282#issuecomment-2771625260 From mbaesken at openjdk.org Wed Apr 2 07:59:21 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 2 Apr 2025 07:59:21 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 02:11:03 GMT, David Holmes wrote: > > > Could we maybe load the release file with a bit of delay to avoid even a small hit on startup performance? > > That may be possible: a one-of "periodic task" for the Watcher thread. If it is not loaded by the time we crash (ie when crashing very early) then I don't think its absence will be missed. That sounds like a good idea. The short period of time where the release file is absent will be most likely fine , it is as you said only for very early crashes. Do you have a good example of such a one-of periodic task? Seems the current ones are derived from class PeriodicTask but they are repeated as far as I can see. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2771642837 From cnorrbin at openjdk.org Wed Apr 2 08:20:08 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 2 Apr 2025 08:20:08 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v17] In-Reply-To: References: Message-ID: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: typo fix + extra assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23416/files - new: https://git.openjdk.org/jdk/pull/23416/files/7bd2b66b..c0f6dc10 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=15-16 Stats: 3 lines in 2 files changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From cnorrbin at openjdk.org Wed Apr 2 08:20:09 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 2 Apr 2025 08:20:09 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v16] In-Reply-To: References: Message-ID: <0mpq0Ptl4mGT_eLcbK7PZWcfs6LQs8mspaNSSI6furo=.00df2788-d5fa-41e8-8bff-61fbbd583ea2@github.com> On Tue, 1 Apr 2025 07:07:56 GMT, Axel Boldt-Christmas wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> axel feedback > > src/hotspot/share/utilities/rbTree.inline.hpp line 621: > >> 619: assert_leq(from, start); >> 620: assert_geq(to, start); >> 621: } > > Not sure if we should add an else branch here where we assert end == nullptr / end == start. But given that we will more than likely just crash when reading `start->next()`, it does not matter to much. Regardless of any assert, a bad interval will crash early. Added an assert just in case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r2024312957 From duke at openjdk.org Wed Apr 2 08:22:22 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 2 Apr 2025 08:22:22 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v11] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 21:42:08 GMT, Volodymyr Paprotski wrote: >> Ferenc Rakoczi has updated the pull request incrementally with two additional commits since the last revision: >> >> - Further readability improvements. >> - Added asserts for array sizes > > src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 342: > >> 340: // Performs two keccak() computations in parallel. The steps of the >> 341: // two computations are executed interleaved. >> 342: static address generate_double_keccak(StubGenerator *stubgen, MacroAssembler *_masm) { > > This function seems ok. I didnt do as line-by-line 'exact' review as for the NTT intrinsics, but just put the new version into a diff next to the original function. Seems like a reasonable clean 'refactor' (hardcode the blocksize, add new input registers 10-14. Makes it really easy to spot vs 0-4 original registers..) > > I didnt realize before that the 'top 3 limbs' are wasted. I guess it doesnt matter, there are registers to spare aplenty and it makes the entire algorithm cleaner and easier to follow. > > I did also stare at the algorithm with the 'What about AVX2' question.. This function would pretty much need to be rewritten it looks like :/ > > Last two questions.. > - how much performance is gained from doubling this function up? > - If thats worth it.. what if instead it was quadrupled the input? (I scanned the java code, it looked like NR was parametrized already to 2..). It looks like there are almost enough registers here to go to 4 (I think 3 would need to be freed up somehow.. alternatively, the upper 3 limbs are empty in all operations, perhaps it could be used instead.. at the expense of readability) Well, the algorithm (keccak()) is doing the same things on 5 array elements (It works on essentially a 5x5 matrix doing row and column operations, so putting 5 array entries in a vector register was the "natural" thing to do). This function can only be used under very special circumstances, which occur during the generation of tha "A matrix" in ML-KEM and ML-DSA, the speed of that matrix generation has almost doubled (I don't have exact numbers). We are using 7 registers per state and 15 for the constants, so we have only 3 to spare. We could perhaps juggle with the constants keeping just the ones that will be needed next in registers and reloading them "just in time", but that might slow things down a bit - more load instructions executed + maybe some load delay. On the other hand, more parallelism. I might try it out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2024317665 From peter.firmstone at zeus.net.au Wed Apr 2 08:24:09 2025 From: peter.firmstone at zeus.net.au (Peter Firmstone) Date: Wed, 2 Apr 2025 18:24:09 +1000 Subject: GTestWrapper error - Operation failed register assertion. Message-ID: <95bbad8f-84dc-4550-aca4-40ef56dc546d@zeus.net.au> Hello, Just wondering if anyone can help me make sense of this error? I'm only seeing it on AMD EPYC, not on AArch64 or Xeon and am not able to reproduce it locally.? Note this is a fork, jdk:master was last merged last weekend. Thank you, Peter. gtest/GTestWrapper.java failing on linux-x64 and windows-x64 with AMD EPYC 7763. ? Issue #57 ? pfirmstone/jdk-with-authorization An example GTestWrapper.jtr output, see link above for more details. [ RUN????? ] RBTreeTest.InsertRemoveVerify_vm # # Compiler replay data is saved as: # /home/runner/work/jdk-with-authorization/jdk-with-authorization/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_common/scratch/1/replay_pid2173.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # Warning: asynclog is OFF. Warning: asynclog is OFF. Warning: asynclog is OFF. Warning: asynclog is OFF. Warning: asynclog is OFF. Warning: asynclog is OFF. OpenJDK 64-Bit Server VM warning: c1: printing of assembly code is enabled; turning on DebugNonSafepoints to gain additional output OpenJDK 64-Bit Server VM warning: c2: printing of assembly code is enabled; turning on DebugNonSafepoints to gain additional output assert failed: assert(opr->is_register()) failed: should not call this otherwiseOpenJDK 64-Bit Server VM warning: outputStream::do_vsnprintf output truncated -- buffer length is 11 bytes but 12 bytes are needed. OpenJDK 64-Bit Server VM warning: outputStream::do_vsnprintf output truncated -- buffer length is 11 bytes but 12 bytes are needed. # # A fatal error has been detected by the Java Runtime Environment: # #? Internal Error (/home/runner/work/jdk-with-authorization/jdk-with-authorization/src/hotspot/share/c1/c1_LinearScan.cpp:117), pid=2173, tid=2191 #? assert(opr->is_register()) failed: should not call this otherwise # # JRE version: OpenJDK Runtime Environment (25.0) (fastdebug build 25-internal-pfirmstone-1ff412a4a97b541949f25ec05656a3c1ff91f91f) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 25-internal-pfirmstone-1ff412a4a97b541949f25ec05656a3c1ff91f91f, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V? [libjvm.so+0xd68790]? LinearScan::reg_num(LIR_Opr)+0xc0 # # Core dump will be written. Default location: Core dumps may be processed with "/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h" (or dumping to /home/runner/work/jdk-with-authorization/jdk-with-authorization/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_common/scratch/1/core.2173) # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # ---------------? S U M M A R Y ------------ Command Line: -XX:+ExecutingUnitTests Host: fv-az1116-308, AMD EPYC 7763 64-Core Processor, 4 cores, 15G, Ubuntu 22.04.5 LTS Time: Wed Apr? 2 05:18:01 2025 UTC elapsed time: 38.495774 seconds (0d 0h 0m 38s) ---------------? T H R E A D? --------------- Current thread (0x000055c186978be0):? JavaThread "C1 CompilerThread0" daemon [_thread_in_native, id=2191, stack(0x00007efcb84f0000,0x00007efcb85f0000) (1024K)] Current CompileTask: C1:38495?? 14?????? 3?????? java.util.Objects::hashCode (13 bytes) Stack: [0x00007efcb84f0000,0x00007efcb85f0000], sp=0x00007efcb85ed5e0,? free space=1013k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V? [libjvm.so+0xd68790]? LinearScan::reg_num(LIR_Opr)+0xc0 (c1_LinearScan.cpp:117) V? [libjvm.so+0xd82858] RegisterVerifier::process_operations(LIR_List*, GrowableArray*)+0x7e8? (c1_LinearScan.cpp:3500) V? [libjvm.so+0xd82c2e] RegisterVerifier::process_block(BlockBegin*)+0x9e (c1_LinearScan.cpp:3590) V? [libjvm.so+0xd82ffb] RegisterVerifier::verify(BlockBegin*)+0x23b (c1_LinearScan.cpp:3565) V? [libjvm.so+0xd88aa9]? LinearScan::verify_registers()+0x1b9 (c1_LinearScan.cpp:3534) V? [libjvm.so+0xd88d1d]? LinearScan::verify()+0xad (c1_LinearScan.cpp:3281) V? [libjvm.so+0xd8f0dc]? LinearScan::do_linear_scan()+0x1cc (c1_LinearScan.cpp:3120) V? [libjvm.so+0xccd0bd]? Compilation::emit_lir()+0x85d (c1_Compilation.cpp:274) V? [libjvm.so+0xccf626]? Compilation::compile_java_method()+0x1f6 (c1_Compilation.cpp:404) V? [libjvm.so+0xcd006e]? Compilation::compile_method()+0x21e (c1_Compilation.cpp:479) V? [libjvm.so+0xcd0788] Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, bool, DirectiveSet*)+0x318 (c1_Compilation.cpp:609) V? [libjvm.so+0xcd1ec5]? Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xb5? (c1_Compiler.cpp:262) V? [libjvm.so+0xfb9fb7] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xae7 (compileBroker.cpp:2307) V? [libjvm.so+0xfbadb8] CompileBroker::compiler_thread_loop()+0x5c8 (compileBroker.cpp:1951) V? [libjvm.so+0x14e982e]? JavaThread::thread_main_inner()+0xee (javaThread.cpp:776) V? [libjvm.so+0x202866e]? Thread::call_run()+0xbe (thread.cpp:231) V? [libjvm.so+0x1b3271b]? thread_native_entry(Thread*)+0x12b (os_linux.cpp:877) Registers: RAX=0x00007efcd9b43000, RBX=0x00007efcd99856c4, RCX=0x00007efcd8fcd6d0, RDX=0x00007efcd8fe21e0 RSP=0x00007efcb85ed5e0, RBP=0x00007efcb85ed600, RSI=0x0000000000000075, RDI=0x00007efcd8fe1be8 R8 =0x0000000000000000, R9 =0x0000000000000000, R10=0x0000000000000000, R11=0x0000000000000000 R12=0x00007efcb85ed5e8, R13=0x000055c18a7f2818, R14=0x00007efcd997f380, R15=0x00007efcb85ed670 RIP=0x00007efcd7968790, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000006 ? TRAPNO=0x000000000000000e XMM[0]=0x0000000000000000 0x0000000000000000 XMM[1]=0x0000000000000000 0x0000000000000006 XMM[2]=0x0000000000000000 0x0000000000000000 XMM[3]=0x0000000000000000 0x0000000000000000 XMM[4]=0x0000000000000000 0x0000000000000000 XMM[5]=0x0000000000000000 0x000055c186978be0 XMM[6]=0x0000000000000000 0x0000000000000000 XMM[7]=0x0000000000000000 0x000055c186978be0 XMM[8]=0x7463656a624f2f6c 0x6974752f6176616a XMM[9]=0x0000000000000000 0x0000000000000000 XMM[10]=0x0000000000000001 0x0000000000000001 XMM[11]=0x0000000000000000 0x00007efcd8fac4c0 XMM[12]=0x0000000000000000 0x0000000000000000 XMM[13]=0x0000000000000000 0x0000000000000000 XMM[14]=0x0000000000000000 0x0000000000000000 XMM[15]=0xcafebabecafebabe 0xcafebabecafebabe ? MXCSR=0x00001fa2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefank at openjdk.org Wed Apr 2 08:48:27 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 08:48:27 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v7] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 20:53:06 GMT, Gerard Ziemski wrote: >> This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. >> >> I tried to fill in tag, when I was pretty certain that I had the right type. >> >> At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > remove default value parameter if it's false from os::reserve_memory So, there were two problems: 1) Is the one I explained above. The failure mode is the build fails on some platforms. 2) The assert you listed above. That one is caused by the test first reserving with mtTest and then committing with mtGC. I simply also changed mtGC to mtTest and then the test passes. Given the earlier issues with incorporating my feedback I'll provide my updated feedback as a branch instead. This is the diff: https://github.com/openjdk/jdk/compare/pr/24282...stefank:jdk:pull_24282_stefank_feedback And this is the branch: https://github.com/stefank/jdk/tree/pull_24282_stefank_feedback The you can fetch my branch to your local machine by running the following command: git fetch https://github.com/stefank/jdk pull_24282_stefank_feedback:pull_24282_stefank_feedback And then you can test my branch if you want. When you are satisfied that it doesn't have any problems, then you can bring my changes over to your own review branch by calling the following command (while having your review branch as the active branch): git merge --ff pull_24282_stefank_feedback And then you can add more changes if there are more tweaks that needs to be done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24282#issuecomment-2771851832 From aboldtch at openjdk.org Wed Apr 2 08:53:54 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 2 Apr 2025 08:53:54 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v17] In-Reply-To: References: Message-ID: <9QKWbHAjyWfwVUQyaqU7wDMTJCQfPsjr9fIfynAuAgU=.d98839fb-f07d-4f06-9f07-92008fc653d8@github.com> On Wed, 2 Apr 2025 08:20:08 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. >> >> Two key changes enable this feature: >> 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. >> 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. >> >> >> Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. >> >> An example of how you could use the intrusive tree is found below: >> >> ```c++ >> struct MyIntrusiveStructure { >> Node node; // The tree node is part of an external structure >> int data; >> >> MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} >> Node* get_node() { return &node; } >> static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } >> }; >> >> Tree my_intrusive_tree; >> >> Cursor insert_cursor = my_intrusive_tree.cursor_find(0); >> Node insert_node = Node(0); >> >> // Custom allocation here is just malloc >> MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); >> new (place) MyIntrusiveStructure(0, insert_node); >> >> my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); >> >> Cursor find_cursor = my_intrusive_tree.cursor_find(0); >> int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; >> >> >> >> Please let me know any feedback or concerns! > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > typo fix + extra assert Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23416#pullrequestreview-2735383266 From stuefe at openjdk.org Wed Apr 2 09:42:05 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 2 Apr 2025 09:42:05 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances [v2] In-Reply-To: References: Message-ID: <_m74U7bVGssSQnbrkP-4KvS5nga3bg4Bh4g5BlU07Kw=.f951a8cf-a4a6-452f-936e-284decfd9df8@github.com> > In preparation for planned GC performance improvements (KLUT), I would like to reduce the average number of oop map entries. > > For details, please see JBS issue text. > > ----------------------- > > Patch results: > > The patch brings a positive change of oop map size, reducing the likelihood of lengthy oop maps. Here the oop map size distribution over all JDK classes in the JDK image: > > Before: > > 5395 - non-static oop maps (0 entries) > 9330 - non-static oop maps (1 entries) > 1449 - non-static oop maps (2 entries) > 274 - non-static oop maps (3 entries) > 218 - non-static oop maps (4 entries) > 75 - non-static oop maps (5 entries) > 7 - non-static oop maps (6 entries) > 4 - non-static oop maps (7 entries) > > > Now: > > 5395 - non-static oop maps (0 entries) > 10178 - non-static oop maps (1 entries) > 933 - non-static oop maps (2 entries) > 229 - non-static oop maps (3 entries) > 16 - non-static oop maps (4 entries) > 1 - non-static oop maps (5 entries) > > > For example, `java.util.concurrent.ConcurrentHashMap$TreeNode` is changed from having 2 entries to having just one entry, which is nice for a class that may be instantiated a lot: > > Before: > > java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000000d1dddc0} > - ---- non-static fields (9 words): > - final 'hash' 'I' @12 > - final 'key' 'Ljava/lang/Object;' @16 > - volatile 'val' 'Ljava/lang/Object;' @20 > - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class > - 'red' 'Z' @28 << derived class starts here, non-oops lead > - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 > - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 > - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 > - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @44 > - non-static oop maps (2 entries): 16-24 32-44 > > Now: > > java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000007e1de450} > - ---- non-static fields (9 words): > - final 'hash' 'I' @12 > - final 'key' 'Ljava/lang/Object;' @16 > - volatile 'val' 'Ljava/lang/Object;' @20 > - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class > - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @28 << class starts here, oops lead > - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 > - 'right' 'Ljava/util/concurrent/Concurre... Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - add regression test - Reworked to use prior super klass layout reconstruction pass - Merge branch 'master' into JDK-8353273-Reduce-average-number-of-oop-map-entries-in-instance-objects - alternate-order - print ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24330/files - new: https://git.openjdk.org/jdk/pull/24330/files/f4933c1d..dfbe4859 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24330&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24330&range=00-01 Stats: 10680 lines in 182 files changed: 6721 ins; 3238 del; 721 mod Patch: https://git.openjdk.org/jdk/pull/24330.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24330/head:pull/24330 PR: https://git.openjdk.org/jdk/pull/24330 From stuefe at openjdk.org Wed Apr 2 09:42:19 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 2 Apr 2025 09:42:19 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 17:34:28 GMT, Frederic Parain wrote: >> In preparation for planned GC performance improvements (KLUT), I would like to reduce the average number of oop map entries. >> >> For details, please see JBS issue text. >> >> ----------------------- >> >> Patch results: >> >> The patch brings a positive change of oop map size, reducing the likelihood of lengthy oop maps. Here the oop map size distribution over all JDK classes in the JDK image: >> >> Before: >> >> 5395 - non-static oop maps (0 entries) >> 9330 - non-static oop maps (1 entries) >> 1449 - non-static oop maps (2 entries) >> 274 - non-static oop maps (3 entries) >> 218 - non-static oop maps (4 entries) >> 75 - non-static oop maps (5 entries) >> 7 - non-static oop maps (6 entries) >> 4 - non-static oop maps (7 entries) >> >> >> Now: >> >> 5395 - non-static oop maps (0 entries) >> 10178 - non-static oop maps (1 entries) >> 933 - non-static oop maps (2 entries) >> 229 - non-static oop maps (3 entries) >> 16 - non-static oop maps (4 entries) >> 1 - non-static oop maps (5 entries) >> >> >> For example, `java.util.concurrent.ConcurrentHashMap$TreeNode` is changed from having 2 entries to having just one entry, which is nice for a class that may be instantiated a lot: >> >> Before: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000000d1dddc0} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'red' 'Z' @28 << derived class starts here, non-oops lead >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 >> - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 >> - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 >> - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @44 >> - non-static oop maps (2 entries): 16-24 32-44 >> >> Now: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000007e1de450} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @28 << class starts here, oop... > > A possible improvement to this code would be to compute if the super class' layout ends with oops during the reconstruction (reconstruct_layout()), to avoid having to iterate over the fields a second time. @fparain I adapted your idea. I also added a regression test. Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24330#issuecomment-2771933660 From cnorrbin at openjdk.org Wed Apr 2 11:39:55 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 2 Apr 2025 11:39:55 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v17] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 08:20:08 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. >> >> Two key changes enable this feature: >> 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. >> 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. >> >> >> Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. >> >> An example of how you could use the intrusive tree is found below: >> >> ```c++ >> struct MyIntrusiveStructure { >> Node node; // The tree node is part of an external structure >> int data; >> >> MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} >> Node* get_node() { return &node; } >> static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } >> }; >> >> Tree my_intrusive_tree; >> >> Cursor insert_cursor = my_intrusive_tree.cursor_find(0); >> Node insert_node = Node(0); >> >> // Custom allocation here is just malloc >> MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); >> new (place) MyIntrusiveStructure(0, insert_node); >> >> my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); >> >> Cursor find_cursor = my_intrusive_tree.cursor_find(0); >> int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; >> >> >> >> Please let me know any feedback or concerns! > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > typo fix + extra assert Thank you everyone for your time on this! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23416#issuecomment-2772281994 From duke at openjdk.org Wed Apr 2 11:39:55 2025 From: duke at openjdk.org (duke) Date: Wed, 2 Apr 2025 11:39:55 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v17] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 08:20:08 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. >> >> Two key changes enable this feature: >> 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. >> 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. >> >> >> Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. >> >> An example of how you could use the intrusive tree is found below: >> >> ```c++ >> struct MyIntrusiveStructure { >> Node node; // The tree node is part of an external structure >> int data; >> >> MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} >> Node* get_node() { return &node; } >> static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } >> }; >> >> Tree my_intrusive_tree; >> >> Cursor insert_cursor = my_intrusive_tree.cursor_find(0); >> Node insert_node = Node(0); >> >> // Custom allocation here is just malloc >> MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); >> new (place) MyIntrusiveStructure(0, insert_node); >> >> my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); >> >> Cursor find_cursor = my_intrusive_tree.cursor_find(0); >> int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; >> >> >> >> Please let me know any feedback or concerns! > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > typo fix + extra assert @caspernorrbin Your change (at version c0f6dc10826c248634d2d516d1f64214351ae70b) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23416#issuecomment-2772284663 From stefank at openjdk.org Wed Apr 2 11:41:51 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Apr 2025 11:41:51 GMT Subject: RFR: 8353264: ZGC: Windows heap unreserving is broken Message-ID: During the development of [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we found that the functionality to release reserved memory for the heap is broken. The current implementation passes in the size of the reserved memory area, but according to the documentation the call should be done with `0` as the dwSize argument: If the dwFreeType parameter is MEM_RELEASE, dwSize must be 0 (zero) Generational ZGC isn't affected by this because we never release any reserved memory for the heap. However, the changes in JDK-8350441 is going to change that and we will start to release memory in certain corner-cases. In Single-gen ZGC, which exists in older releases, we have paths that do release memory for "views" into the heap. This only happens if something blocks the memory areas were we want to set up our "views" of the heap. We should probably backport this fix to the affected releases. I've added a unit test that provokes the problem and I have run this fix together with the changes for JDK-8350441. ------------- Commit messages: - 8353264: ZGC: Windows heap unreserving is broken Changes: https://git.openjdk.org/jdk/pull/24377/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24377&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353264 Stats: 26 lines in 2 files changed: 23 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24377.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24377/head:pull/24377 PR: https://git.openjdk.org/jdk/pull/24377 From zgu at openjdk.org Wed Apr 2 11:59:14 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 2 Apr 2025 11:59:14 GMT Subject: RFR: 8353329: Small memory leak when create GrowableArray with initial size 0 [v3] In-Reply-To: References: <8AYi7nNaVzabbCR4r8fRTK7_rPXo4JgVUI6dKOXaxiQ=.289463ca-a4c5-4966-b14b-1039ad51cc32@github.com> Message-ID: <2GQOqn4PGZiDc6G9Ei888lnWmiQ0ZvabbGZSZPOx1S0=.ba12e5f8-5b1d-44e4-899c-59d39516e7e2@github.com> On Tue, 1 Apr 2025 15:39:38 GMT, Zhengyu Gu wrote: >> Please review this small fix to avoid 1 byte leak when create a GrowableArray with initial size = 0. >> >> GrowableArray's c-heap allocator does not check size = 0 and os:malloc(0) will malloc at least 1 byte. > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Added empty lines Thanks for the reviews, @jdksjolen and @stefank ------------- PR Comment: https://git.openjdk.org/jdk/pull/24341#issuecomment-2772328598 From zgu at openjdk.org Wed Apr 2 11:59:15 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 2 Apr 2025 11:59:15 GMT Subject: Integrated: 8353329: Small memory leak when create GrowableArray with initial size 0 In-Reply-To: <8AYi7nNaVzabbCR4r8fRTK7_rPXo4JgVUI6dKOXaxiQ=.289463ca-a4c5-4966-b14b-1039ad51cc32@github.com> References: <8AYi7nNaVzabbCR4r8fRTK7_rPXo4JgVUI6dKOXaxiQ=.289463ca-a4c5-4966-b14b-1039ad51cc32@github.com> Message-ID: <7rgISGNTgwBt5qnKAFzPhlk4um92rb-yy0U7-HlxGcA=.22ea3f6d-1bb2-4ea0-ad0e-76cd64dca949@github.com> On Tue, 1 Apr 2025 00:18:07 GMT, Zhengyu Gu wrote: > Please review this small fix to avoid 1 byte leak when create a GrowableArray with initial size = 0. > > GrowableArray's c-heap allocator does not check size = 0 and os:malloc(0) will malloc at least 1 byte. This pull request has now been integrated. Changeset: b80b04d7 Author: Zhengyu Gu URL: https://git.openjdk.org/jdk/commit/b80b04d77afdb2a808e2c7f9268d8092eb16714e Stats: 10 lines in 2 files changed: 5 ins; 4 del; 1 mod 8353329: Small memory leak when create GrowableArray with initial size 0 Reviewed-by: jsjolen, stefank ------------- PR: https://git.openjdk.org/jdk/pull/24341 From cnorrbin at openjdk.org Wed Apr 2 12:44:41 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 2 Apr 2025 12:44:41 GMT Subject: Integrated: 8349211: Add support for intrusive trees to the utilities red-black tree In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 11:20:49 GMT, Casper Norrbin wrote: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! This pull request has now been integrated. Changeset: 4f97c4c0 Author: Casper Norrbin URL: https://git.openjdk.org/jdk/commit/4f97c4c03661a862e62106b3a5b2aa8696196baf Stats: 1149 lines in 3 files changed: 773 ins; 134 del; 242 mod 8349211: Add support for intrusive trees to the utilities red-black tree Reviewed-by: aboldtch, jsjolen ------------- PR: https://git.openjdk.org/jdk/pull/23416 From fparain at openjdk.org Wed Apr 2 13:34:08 2025 From: fparain at openjdk.org (Frederic Parain) Date: Wed, 2 Apr 2025 13:34:08 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances [v2] In-Reply-To: <_m74U7bVGssSQnbrkP-4KvS5nga3bg4Bh4g5BlU07Kw=.f951a8cf-a4a6-452f-936e-284decfd9df8@github.com> References: <_m74U7bVGssSQnbrkP-4KvS5nga3bg4Bh4g5BlU07Kw=.f951a8cf-a4a6-452f-936e-284decfd9df8@github.com> Message-ID: <-_48fN7-F6OHlZHMilOq4AV8zJwclDrd1ZFNW5rABXQ=.013a3f7f-d230-4408-8e44-59909c553597@github.com> On Wed, 2 Apr 2025 09:42:05 GMT, Thomas Stuefe wrote: >> In preparation for planned GC performance improvements (KLUT), I would like to reduce the average number of oop map entries. >> >> For details, please see JBS issue text. >> >> ----------------------- >> >> Patch results: >> >> The patch brings a positive change of oop map size, reducing the likelihood of lengthy oop maps. Here the oop map size distribution over all JDK classes in the JDK image: >> >> Before: >> >> 5395 - non-static oop maps (0 entries) >> 9330 - non-static oop maps (1 entries) >> 1449 - non-static oop maps (2 entries) >> 274 - non-static oop maps (3 entries) >> 218 - non-static oop maps (4 entries) >> 75 - non-static oop maps (5 entries) >> 7 - non-static oop maps (6 entries) >> 4 - non-static oop maps (7 entries) >> >> >> Now: >> >> 5395 - non-static oop maps (0 entries) >> 10178 - non-static oop maps (1 entries) >> 933 - non-static oop maps (2 entries) >> 229 - non-static oop maps (3 entries) >> 16 - non-static oop maps (4 entries) >> 1 - non-static oop maps (5 entries) >> >> >> For example, `java.util.concurrent.ConcurrentHashMap$TreeNode` is changed from having 2 entries to having just one entry, which is nice for a class that may be instantiated a lot: >> >> Before: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000000d1dddc0} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'red' 'Z' @28 << derived class starts here, non-oops lead >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 >> - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 >> - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 >> - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @44 >> - non-static oop maps (2 entries): 16-24 32-44 >> >> Now: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000007e1de450} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @28 << class starts here, oop... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - add regression test > - Reworked to use prior super klass layout reconstruction pass > - Merge branch 'master' into JDK-8353273-Reduce-average-number-of-oop-map-entries-in-instance-objects > - alternate-order > - print This version looks better. Thank you for the changes. Fred ------------- Marked as reviewed by fparain (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24330#pullrequestreview-2736293279 From duke at openjdk.org Wed Apr 2 14:02:25 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Wed, 2 Apr 2025 14:02:25 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock [v2] In-Reply-To: References: Message-ID: > ### Summary: > This PR makes memory operations atomic with NMT accounting. > > ### The problem: > In memory related functions like `os::commit_memory` and `os::reserve_memory` the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in `MemTracker`, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races. > > 1.1 Thread_1 releases range_A. > 1.2 Thread_1 tells NMT "range_A has been released". > > 2.1 Thread_2 reserves (the now free) range_A. > 2.2 Thread_2 tells NMT "range_A is reserved". > > Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS. > > ### Solution: > Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself. > > ### Other notes: > I also simplified this pattern found in many places: > > if (MemTracker::enabled()) { > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_some_operation(addr, bytes); > if (result != nullptr) { > MemTracker::record_some_operation(addr, bytes); > } > } else { > result = pd_unmap_memory(addr, bytes); > } > ``` > To: > > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_unmap_memory(addr, bytes); > MemTracker::record_some_operation(addr, bytes); > ``` > This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. > > I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section. However, I found that the OS-specific "pd_" functions are already short and to-the-point, so doing this wasn't reducing the lock scope very much. Instead it just makes the code more messy by having to maintain the locking and NMT accounting in each platform specific implementation. > > In many places I've done minor refactoring by relocating call... Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision: - tests and comments - Revert "make memory op and NMT accounting atomic" This reverts commit 86423d0b7e8e2b0b313a686a64c803028a5f2420. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24084/files - new: https://git.openjdk.org/jdk/pull/24084/files/86423d0b..74f31202 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24084&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24084&range=00-01 Stats: 246 lines in 12 files changed: 60 ins; 123 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/24084.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24084/head:pull/24084 PR: https://git.openjdk.org/jdk/pull/24084 From duke at openjdk.org Wed Apr 2 14:06:09 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Wed, 2 Apr 2025 14:06:09 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 14:02:25 GMT, Robert Toyonaga wrote: >> ### Summary: >> This PR makes memory operations atomic with NMT accounting. >> >> ### The problem: >> In memory related functions like `os::commit_memory` and `os::reserve_memory` the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in `MemTracker`, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races. >> >> 1.1 Thread_1 releases range_A. >> 1.2 Thread_1 tells NMT "range_A has been released". >> >> 2.1 Thread_2 reserves (the now free) range_A. >> 2.2 Thread_2 tells NMT "range_A is reserved". >> >> Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS. >> >> ### Solution: >> Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself. >> >> ### Other notes: >> I also simplified this pattern found in many places: >> >> if (MemTracker::enabled()) { >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_some_operation(addr, bytes); >> if (result != nullptr) { >> MemTracker::record_some_operation(addr, bytes); >> } >> } else { >> result = pd_unmap_memory(addr, bytes); >> } >> ``` >> To: >> >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_unmap_memory(addr, bytes); >> MemTracker::record_some_operation(addr, bytes); >> ``` >> This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. >> >> I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section. However, I found that the OS-specific "pd_" functions are already short and to-the-point, so doing this wasn't reducing the lock scope very much. Instead it just makes the code more messy by having to maintain the locking and NMT accounting in each platform specific i... > > Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision: > > - tests and comments > - Revert "make memory op and NMT accounting atomic" > > This reverts commit 86423d0b7e8e2b0b313a686a64c803028a5f2420. OK I have reverted the original changes, added comments, and kept the new tests that are still relevant. Please have another look when you have time. I'll go ahead and open RFE's for the topics you suggested above. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2772669368 From stuefe at openjdk.org Wed Apr 2 14:34:01 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 2 Apr 2025 14:34:01 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 17:34:28 GMT, Frederic Parain wrote: >> In preparation for planned GC performance improvements (KLUT), I would like to reduce the average number of oop map entries. >> >> For details, please see JBS issue text. >> >> ----------------------- >> >> Patch results: >> >> The patch brings a positive change of oop map size, reducing the likelihood of lengthy oop maps. Here the oop map size distribution over all JDK classes in the JDK image: >> >> Before: >> >> 5395 - non-static oop maps (0 entries) >> 9330 - non-static oop maps (1 entries) >> 1449 - non-static oop maps (2 entries) >> 274 - non-static oop maps (3 entries) >> 218 - non-static oop maps (4 entries) >> 75 - non-static oop maps (5 entries) >> 7 - non-static oop maps (6 entries) >> 4 - non-static oop maps (7 entries) >> >> >> Now: >> >> 5395 - non-static oop maps (0 entries) >> 10178 - non-static oop maps (1 entries) >> 933 - non-static oop maps (2 entries) >> 229 - non-static oop maps (3 entries) >> 16 - non-static oop maps (4 entries) >> 1 - non-static oop maps (5 entries) >> >> >> For example, `java.util.concurrent.ConcurrentHashMap$TreeNode` is changed from having 2 entries to having just one entry, which is nice for a class that may be instantiated a lot: >> >> Before: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000000d1dddc0} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'red' 'Z' @28 << derived class starts here, non-oops lead >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 >> - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 >> - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 >> - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @44 >> - non-static oop maps (2 entries): 16-24 32-44 >> >> Now: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000007e1de450} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @28 << class starts here, oop... > > A possible improvement to this code would be to compute if the super class' layout ends with oops during the reconstruction (reconstruct_layout()), to avoid having to iterate over the fields a second time. Thanks @fparain ! May I have a second review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24330#issuecomment-2772756258 From forax at univ-mlv.fr Wed Apr 2 14:33:57 2025 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 2 Apr 2025 16:33:57 +0200 (CEST) Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: Message-ID: <1257824626.204034517.1743604437041.JavaMail.zimbra@univ-eiffel.fr> Hi Per, last week, at JChateau, we had a one hour session about stable values, I've build the JDK with this PR so we can discuss about it. To present the API, i start from the double check locking, rewriting it to use the StableValue API. The main remark was that methods like orElseSet() or isSet() are hard to used correctly. In my opinion, the current API is a mix of a high level API and a low-level API but it's too easy to misuse the low-level API. high level: - methods supplier(), list() and map() Those are easy to use low level: - methods: of, of(value), orElseSet, setOrThrow(), etc Those are hard to use properly. I think, not necessary in this PR, that the current API should be separated into two different classes, one in java.lang with the high level API (the static methods other than Of() and one in java.util.concurrent with the low level API where you have to know what you are doing (like with any classes of java.util.concurrent). regards, R?mi ----- Original Message ----- > From: "Per Minborg" > To: "compiler-dev" , "core-libs-dev" , "hotspot-dev" > , "security-dev" > Sent: Thursday, March 13, 2025 12:20:10 PM > Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) > Implement JEP 502. > > The PR passes tier1-tier3 tests. > > ------------- > > Commit messages: > - Use acquire semantics for reading rather than volatile semantics > - Add missing null check > - Simplify handling of sentinel, wrap, and unwrap > - Fix JavaDoc issues > - Fix members in StableEnumFunction > - Address some comments in the PR > - Merge branch 'master' into implement-jep502 > - Revert change > - Fix copyright issues > - Update JEP number > - ... and 231 more: https://git.openjdk.org/jdk/compare/4cf63160...09ca44e6 > > Changes: https://git.openjdk.org/jdk/pull/23972/files > Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=00 > Issue: https://bugs.openjdk.org/browse/JDK-8351565 > Stats: 3980 lines in 30 files changed: 3949 ins; 18 del; 13 mod > Patch: https://git.openjdk.org/jdk/pull/23972.diff > Fetch: git fetch https://git.openjdk.org/jdk.git pull/23972/head:pull/23972 > > PR: https://git.openjdk.org/jdk/pull/23972 From pchilanomate at openjdk.org Wed Apr 2 14:35:06 2025 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 2 Apr 2025 14:35:06 GMT Subject: RFR: 8353117: Crash: assert(id >= ThreadIdentifier::initial() && id < ThreadIdentifier::current()) failed: must be reasonable) In-Reply-To: References: <6ludi2j0fKqL5_MirvViyefGITPvMKzAIX8EJIhfbFE=.3935e8bc-e9a3-4ed1-8da6-e943c97714b6@github.com> Message-ID: On Tue, 1 Apr 2025 07:29:38 GMT, David Holmes wrote: >> Please review the following fix. For the attaching thread case we are incorrectly setting the `_monitor_owner_id` after `Threads::add()` is called, i.e after the attaching thread becomes visible through a ThreadsListHandle. So if another thread calls `Threads::owning_thread_from_monitor()` in between these events and iterates through all JavaThreads looking for the owner of a given monitor, we might find this attaching thread still with a `_monitor_owner_id` of 0. >> I corrected the ordering and improved verification checks. Tested in mach5 tiers1-5. >> >> Thanks, >> Patricio > > That seems fine to me. Thanks for fixing. Thanks for the reviews @dholmes-ora and @fbredber! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24336#issuecomment-2772754081 From pchilanomate at openjdk.org Wed Apr 2 14:35:07 2025 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 2 Apr 2025 14:35:07 GMT Subject: Integrated: 8353117: Crash: assert(id >= ThreadIdentifier::initial() && id < ThreadIdentifier::current()) failed: must be reasonable) In-Reply-To: <6ludi2j0fKqL5_MirvViyefGITPvMKzAIX8EJIhfbFE=.3935e8bc-e9a3-4ed1-8da6-e943c97714b6@github.com> References: <6ludi2j0fKqL5_MirvViyefGITPvMKzAIX8EJIhfbFE=.3935e8bc-e9a3-4ed1-8da6-e943c97714b6@github.com> Message-ID: On Mon, 31 Mar 2025 18:15:39 GMT, Patricio Chilano Mateo wrote: > Please review the following fix. For the attaching thread case we are incorrectly setting the `_monitor_owner_id` after `Threads::add()` is called, i.e after the attaching thread becomes visible through a ThreadsListHandle. So if another thread calls `Threads::owning_thread_from_monitor()` in between these events and iterates through all JavaThreads looking for the owner of a given monitor, we might find this attaching thread still with a `_monitor_owner_id` of 0. > I corrected the ordering and improved verification checks. Tested in mach5 tiers1-5. > > Thanks, > Patricio This pull request has now been integrated. Changeset: d32ff139 Author: Patricio Chilano Mateo URL: https://git.openjdk.org/jdk/commit/d32ff1392205ea0fd179478a7ddb3d5f63923461 Stats: 30 lines in 7 files changed: 20 ins; 6 del; 4 mod 8353117: Crash: assert(id >= ThreadIdentifier::initial() && id < ThreadIdentifier::current()) failed: must be reasonable) Reviewed-by: dholmes, fbredberg ------------- PR: https://git.openjdk.org/jdk/pull/24336 From gziemski at openjdk.org Wed Apr 2 14:44:07 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 2 Apr 2025 14:44:07 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v7] In-Reply-To: References: Message-ID: <7SqvyOWZc12cMJQfIoCAHBSXjYZzJS6Jsq_hgr9pxoY=.0da488e5-c319-40af-a28d-6d9fc0a37ebd@github.com> On Tue, 1 Apr 2025 20:53:06 GMT, Gerard Ziemski wrote: >> This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. >> >> I tried to fill in tag, when I was pretty certain that I had the right type. >> >> At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > remove default value parameter if it's false from os::reserve_memory > So, there were two problems: > > 1. Is the one I explained above. The failure mode is the build fails on some platforms. > 2. The assert you listed above. That one is caused by the test first reserving with mtTest and then committing with mtGC. I simply also changed mtGC to mtTest and then the test passes. > > Given the earlier issues with incorporating my feedback I'll provide my updated feedback as a branch instead. This is the diff: [pr/24282...stefank:jdk:pull_24282_stefank_feedback](https://github.com/openjdk/jdk/compare/pr/24282...stefank:jdk:pull_24282_stefank_feedback) > > And this is the branch: https://github.com/stefank/jdk/tree/pull_24282_stefank_feedback > > The you can fetch my branch to your local machine by running the following command: > > ``` > git fetch https://github.com/stefank/jdk pull_24282_stefank_feedback:pull_24282_stefank_feedback > ``` > > And then you can test my branch if you want. When you are satisfied that it doesn't have any problems, then you can bring my changes over to your own review branch by calling the following command (while having your review branch as the active branch): > > ``` > git merge --ff pull_24282_stefank_feedback > ``` > > And then you can add more changes if there are more tweaks that needs to be done. This makes incorporating your feedback super easy, thank you! Following and implementing each feedback one at a time was tedious, sorry I missed some changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24282#issuecomment-2772787797 From iklam at openjdk.org Wed Apr 2 16:02:02 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 2 Apr 2025 16:02:02 GMT Subject: RFR: 8353325: Rewrite appcds/methodHandles test cases to use CDSAppTester [v4] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 03:49:05 GMT, Ioi Lam wrote: >> These test cases are rewritten to use CDSAppTester, so that they can also be executed in the new JEP 483 workflow (with `-XX:AOTCache=xxx`, etc). This will increase coverage of current and upcoming AOT features (such as AOT linking of invokedynamic, and AOT method profiling). >> >> These test cases are generated by a bash script. This PR minimizes the generated part so that the main portions of the tests can be modified as a normal java source file. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Fixed failure with ZGC + AOT test case I had to add the following to the `id=aot` part of the tests: @requires vm.cds.supports.aot.class.linking This would exclude this part of the tests when ZGC is enabled by jtreg. Since ZGC doesn?t supported archived Java oops, all AOT method handle optimizations are disabled for ZGC. As a result, the following message will not be printed: out.shouldMatch(".class.load. test.java.lang.invoke." + testClassName + "[$][$]Lambda.*/0x.*source:.*shared.*objects.*file"); So it doesn't make sense to run this part of the with ZGC. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24340#issuecomment-2773051659 From kbarrett at openjdk.org Wed Apr 2 17:11:57 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 2 Apr 2025 17:11:57 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v6] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Fri, 28 Mar 2025 22:24:40 GMT, Doug Simon wrote: >> This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). >> >> By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. >> >> The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. >> >> I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. >> >> When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: >> >> java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: >> >> java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci >> >> at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) >> at java.base/java.lang.Thread.run(Thread.java:1447) >> Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: >> >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp >> /Users/dnsimo... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > convert Windows path to Unix path Marked as reviewed by kbarrett (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24247#pullrequestreview-2737009362 From kbarrett at openjdk.org Wed Apr 2 17:11:58 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 2 Apr 2025 17:11:58 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v6] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Tue, 1 Apr 2025 15:35:45 GMT, Doug Simon wrote: >> test/hotspot/jtreg/TEST.groups line 142: >> >>> 140: >>> 141: tier1_common = \ >>> 142: sources \ >> >> I don't understand this change. How does this end up doing anything different than before? > > This makes `sources` be tested in GHA: https://github.com/openjdk/jdk/blob/a1ab1d8de411aace21decd133e7e74bb97f27897/.github/workflows/test.yml#L88 > > An alternative would be to add a separate GHA jobs just for `sources`: > > - test-name: 'hs/tier1 sources' > test-suite: 'test/hotspot/jtreg/:tier1_sources' > debug-suffix: -debug > > Given how small `sources` is ([currently only 1 test](https://github.com/openjdk/jdk/tree/master/test/hotspot/jtreg/sources)), it felt like it should just be folded into common. Ah, the workflows definition is what I was having trouble finding. I understand now. In light of that, the proposed change to the groups looks fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24247#discussion_r2025256054 From mli at openjdk.org Wed Apr 2 17:15:55 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 2 Apr 2025 17:15:55 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v8] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Thu, 27 Mar 2025 11:22:48 GMT, Robbin Ehn wrote: >> Hi please consider. >> >> |RVWMO| Patched| >> | ---------- | ---------- | >> |fence iorw,iorw| fence iorw,ow| >> |sw t4,120(t2) | sw t4,120(t2) | >> |fence ow,ir | unnecessary_membar_volatile_rvwmo | >> | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | >> |fence iorw,ow | fence iorw,ow| >> |sw t5,124(t2) |sw t5,124(t2) | >> >> |TSO | Patched| >> | ---------- | ---------- | >> | lw a4,120(t2) | lw a6,120(t2) | >> | sw a0,124(t2) | sw t6,124(t2) | >> | fence iorw,iorw | unnecessary_membar_volatile_tso | >> | sw t4,120(t2) | sw t4,120(t2) | >> | fence ow,ir | unnecessary_membar_volatile_tso | >> | sw t6,128(t2) | sw t5,128(t2) | >> | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | >> | fence iorw,iorw | unnecessary_membar_volatile_tso | >> |... | ... | >> | sw a3,120(t2) | sw a0,120(t2) | >> | fence ow,ir | fence ow,ir | >> | lw a7,124(t2) | lw a5,124(t2) | >> >> For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. >> >> The patch do: >> - Separate ztso and rvwmo in ad by using UseZtso predicate. >> - Match all that requires the same membar. >> - Make fence/fencei protected as they shouldn't be using directly. >> - Increased cost of membars to VOLATILE_REF_COST. >> - Added a real_empty pipe. >> - Change to pipe_slow on TSO (as x86). >> >> Note that C2-rv64 is now superior to gcc/clang regrading fencing: >> https://godbolt.org/z/6E3YTP15j >> >> Testing jcstress, tier1 and manually reading the generated assembly. >> Doing additional testing, but RFR it now as it may need some consideration. >> >> /Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into tso-merge > - Merge branch 'master' into tso-merge > - format comment > - Merge branch 'master' into tso-merge > - Review comments > - Merge branch 'master' into tso-merge > - Review comments > - Fixed ws > - Revert NC > - Fixed comment > - ... and 1 more: https://git.openjdk.org/jdk/compare/931a1710...c2688a6a Thanks for the patience and discussion. Looks good to me, just some minor comments. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3558: > 3556: > 3557: void MacroAssembler::membar(uint32_t order_constraint) { > 3558: if (UseZtso && ((order_constraint & StoreLoad) != StoreLoad)) { An assert in Assembler::fence() could help to catch potential misuse in the future: `assert(!UseZtso || ((order_constraint & StoreLoad) == StoreLoad)` src/hotspot/cpu/riscv/riscv.ad line 7951: > 7949: %} > 7950: > 7951: instruct unnecessary_membar_volatile_rvtso() %{ This one could be merged with `unnecessary_membar_volatile_rvwmo`, and remove the `UseZtso` in predicate. ------------- PR Review: https://git.openjdk.org/jdk/pull/24035#pullrequestreview-2736244903 PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2024809780 PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2024811398 From kbarrett at openjdk.org Wed Apr 2 17:30:05 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 2 Apr 2025 17:30:05 GMT Subject: RFR: 8352565: Add native method implementation of Reference.get() [v3] In-Reply-To: References: Message-ID: > Please review this change which adds a native method providing the > implementation of Reference::get. Referece::get is an intrinsic candidate, so > this native method implementation is only used when the intrinsic is not. > > Currently there is intrinsic support by the interpreter, C1, C2, and graal, > which are always used. With this change we can later remove all the > per-platform interpreter intrinsic implementations, and might also remove the > C1 intrinsic implementation. > > Testing: > (1) mach5 tier1-6 normal (so using all the existing intrinsics). > (2) mach5 tier1-6 with interpreter and C1 Reference::get intrinsics disabled. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: add package decl to test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24315/files - new: https://git.openjdk.org/jdk/pull/24315/files/37dc9b74..36bb26a1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24315&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24315&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24315/head:pull/24315 PR: https://git.openjdk.org/jdk/pull/24315 From coleenp at openjdk.org Wed Apr 2 17:33:28 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 2 Apr 2025 17:33:28 GMT Subject: RFR: 8349007: jtreg test ResolvedMethodTableHash takes excessive time Message-ID: This is mostly test change. ResolvedMethodTableHash.java is run /manual but still takes a long time and doesn't really verify that the hash code is any good. This add logging, triggers concurrent work like other ConcurrentHashtables, and checks that a small table is not rehashed. Tested with tier1 (including test). ------------- Commit messages: - 8349007: jtreg test ResolvedMethodTableHash takes excessive time Changes: https://git.openjdk.org/jdk/pull/24383/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24383&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349007 Stats: 97 lines in 2 files changed: 36 ins; 9 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/24383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24383/head:pull/24383 PR: https://git.openjdk.org/jdk/pull/24383 From shade at openjdk.org Wed Apr 2 17:55:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Apr 2025 17:55:22 GMT Subject: RFR: 8353572: x86: AMD platforms miss the check for CLWB feature flag Message-ID: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> Noticed this when doing [JDK-8353558](https://bugs.openjdk.org/browse/JDK-8353558). We only check for CLWB feature flag for Intel platforms. But AMD APM (Rev. 3.36?March 2024) tells me there is a CLWB flag in CPUID Fn0000_0007_EBX_x0 leaf as well. It is in the same place as the flag for Intel. Additional testing: - [x] Ad-hoc tests on Ryzen 5950X ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/24385/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24385&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353572 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24385.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24385/head:pull/24385 PR: https://git.openjdk.org/jdk/pull/24385 From shade at openjdk.org Wed Apr 2 17:55:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Apr 2025 17:55:22 GMT Subject: RFR: 8353572: x86: AMD platforms miss the check for CLWB feature flag In-Reply-To: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> References: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> Message-ID: <340fexFUneJFREco2T_ZMrJdAkntMq5YJp3yVFzQF5U=.105f3e52-71fb-4e40-b376-75d66fbf957b@github.com> On Wed, 2 Apr 2025 17:49:30 GMT, Aleksey Shipilev wrote: > Noticed this when doing [JDK-8353558](https://bugs.openjdk.org/browse/JDK-8353558). We only check for CLWB feature flag for Intel platforms. But AMD APM (Rev. 3.36?March 2024) tells me there is a CLWB flag in CPUID Fn0000_0007_EBX_x0 leaf as well. It is in the same place as the flag for Intel. > > Additional testing: > - [x] Ad-hoc tests on Ryzen 5950X The easiest test is to verify what VMVersion was able to parse out of flags: $ build/linux-x86_64-server-release/images/jdk/bin/java -Xlog:os+cpu 2>&1 | grep clwb --color # Before: Only the raw CPU flags (from the system) contain `clwb` [0.002s][info][os,cpu] flags : ... clflushopt clwb ... # After: "CPU" line now also recognizes clwb is available [0.002s][info][os,cpu] CPU: ... clflushopt, clwb ... [0.002s][info][os,cpu] flags : ... clflushopt clwb ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/24385#issuecomment-2773298027 From jbhateja at openjdk.org Wed Apr 2 18:24:55 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 18:24:55 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v13] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:38:34 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Reacting to comment by Sandhya. @ferakocz , I verified new version of patch on Linux and windows and it works fine. Thanks for addressing my comments. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23860#pullrequestreview-2737186292 From kbarrett at openjdk.org Wed Apr 2 18:33:16 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 2 Apr 2025 18:33:16 GMT Subject: RFR: 8352565: Add native method implementation of Reference.get() [v4] In-Reply-To: References: Message-ID: > Please review this change which adds a native method providing the > implementation of Reference::get. Referece::get is an intrinsic candidate, so > this native method implementation is only used when the intrinsic is not. > > Currently there is intrinsic support by the interpreter, C1, C2, and graal, > which are always used. With this change we can later remove all the > per-platform interpreter intrinsic implementations, and might also remove the > C1 intrinsic implementation. > > Testing: > (1) mach5 tier1-6 normal (so using all the existing intrinsics). > (2) mach5 tier1-6 with interpreter and C1 Reference::get intrinsics disabled. Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: - remove timeout by using waitForReferenceProcessing - make ill-timed gc in non-concurrent case less likely - fix test package use ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24315/files - new: https://git.openjdk.org/jdk/pull/24315/files/36bb26a1..234465f4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24315&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24315&range=02-03 Stats: 20 lines in 1 file changed: 14 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24315/head:pull/24315 PR: https://git.openjdk.org/jdk/pull/24315 From gziemski at openjdk.org Wed Apr 2 18:37:36 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 2 Apr 2025 18:37:36 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v8] In-Reply-To: References: Message-ID: <5b6QutsBRX4KGZY50opOtRztczGFdilIy3CwlisJ4s4=.7b8a3f1a-4f6f-469f-829b-1d693ad1355d@github.com> > This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. > > I tried to fill in tag, when I was pretty certain that I had the right type. > > At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: The real feedback from StefanK ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24282/files - new: https://git.openjdk.org/jdk/pull/24282/files/5de1d560..3bd03cbe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=06-07 Stats: 40 lines in 16 files changed: 0 ins; 2 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/24282.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24282/head:pull/24282 PR: https://git.openjdk.org/jdk/pull/24282 From gziemski at openjdk.org Wed Apr 2 18:37:36 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 2 Apr 2025 18:37:36 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v7] In-Reply-To: References: Message-ID: <1uwNGuYl7MdwLmyxHc75pHVxuIlypTlUUbiuj4on738=.1503e429-dcac-46ac-9bfc-f6e0039762e8@github.com> On Wed, 2 Apr 2025 08:45:44 GMT, Stefan Karlsson wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> remove default value parameter if it's false from os::reserve_memory > > So, there were two problems: > 1) Is the one I explained above. The failure mode is the build fails on some platforms. > 2) The assert you listed above. That one is caused by the test first reserving with mtTest and then committing with mtGC. I simply also changed mtGC to mtTest and then the test passes. > > Given the earlier issues with incorporating my feedback I'll provide my updated feedback as a branch instead. This is the diff: > https://github.com/openjdk/jdk/compare/pr/24282...stefank:jdk:pull_24282_stefank_feedback > > And this is the branch: > https://github.com/stefank/jdk/tree/pull_24282_stefank_feedback > > The you can fetch my branch to your local machine by running the following command: > > git fetch https://github.com/stefank/jdk pull_24282_stefank_feedback:pull_24282_stefank_feedback > > > And then you can test my branch if you want. When you are satisfied that it doesn't have any problems, then you can bring my changes over to your own review branch by calling the following command (while having your review branch as the active branch): > > git merge --ff pull_24282_stefank_feedback > > > And then you can add more changes if there are more tweaks that needs to be done. @stefank I pushed your feedback, thank you for your branch and making it so easy to incorporate your contribution! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24282#issuecomment-2773390856 From kbarrett at openjdk.org Wed Apr 2 18:37:53 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 2 Apr 2025 18:37:53 GMT Subject: RFR: 8352565: Add native method implementation of Reference.get() [v2] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 22:01:55 GMT, Brent Christian wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> parameterized return type of native get0 > > test/hotspot/jtreg/gc/TestNativeReferenceGet.java line 162: > >> 160: System.out.println("Testing nonconcurrent GC"); >> 161: clearReferents(); >> 162: strengthenReferents(); > > Might the GC clear refs between `clearReferents()` and `strengthenReferents()`? Yeah, an ill-timed GC between those operations would result in test failure. I've added a GC immediately before those operations to make that pretty unlikely. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24315#discussion_r2025385681 From kbarrett at openjdk.org Wed Apr 2 18:40:57 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 2 Apr 2025 18:40:57 GMT Subject: RFR: 8352565: Add native method implementation of Reference.get() [v4] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 18:33:16 GMT, Kim Barrett wrote: >> Please review this change which adds a native method providing the >> implementation of Reference::get. Referece::get is an intrinsic candidate, so >> this native method implementation is only used when the intrinsic is not. >> >> Currently there is intrinsic support by the interpreter, C1, C2, and graal, >> which are always used. With this change we can later remove all the >> per-platform interpreter intrinsic implementations, and might also remove the >> C1 intrinsic implementation. >> >> Testing: >> (1) mach5 tier1-6 normal (so using all the existing intrinsics). >> (2) mach5 tier1-6 with interpreter and C1 Reference::get intrinsics disabled. > > Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: > > - remove timeout by using waitForReferenceProcessing > - make ill-timed gc in non-concurrent case less likely > - fix test package use test/hotspot/jtreg/gc/TestNativeReferenceGet.java line 137: > 135: } > 136: checkQueue(); // One last check after refproc complete. > 137: } catch (InterruptedException e) { Rather than using Reference.remove with a timeout, I've changed this to use waitForReferenceProcessing. That removes false passes (from reference processing being slow to deliver) and also removes the delay until timeout for the passing case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24315#discussion_r2025390205 From rehn at openjdk.org Wed Apr 2 18:42:23 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 2 Apr 2025 18:42:23 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v9] In-Reply-To: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: <95VE0F2BRFgHVi0ejPYCsdq3ZpDFLe5Ya-toV5u0-BE=.19b436a4-8bfa-41e6-879f-23887d067c4b@github.com> > Hi please consider. > > |RVWMO| Patched| > | ---------- | ---------- | > |fence iorw,iorw| fence iorw,ow| > |sw t4,120(t2) | sw t4,120(t2) | > |fence ow,ir | unnecessary_membar_volatile_rvwmo | > | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | > |fence iorw,ow | fence iorw,ow| > |sw t5,124(t2) |sw t5,124(t2) | > > |TSO | Patched| > | ---------- | ---------- | > | lw a4,120(t2) | lw a6,120(t2) | > | sw a0,124(t2) | sw t6,124(t2) | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > | sw t4,120(t2) | sw t4,120(t2) | > | fence ow,ir | unnecessary_membar_volatile_tso | > | sw t6,128(t2) | sw t5,128(t2) | > | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > |... | ... | > | sw a3,120(t2) | sw a0,120(t2) | > | fence ow,ir | fence ow,ir | > | lw a7,124(t2) | lw a5,124(t2) | > > For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. > > The patch do: > - Separate ztso and rvwmo in ad by using UseZtso predicate. > - Match all that requires the same membar. > - Make fence/fencei protected as they shouldn't be using directly. > - Increased cost of membars to VOLATILE_REF_COST. > - Added a real_empty pipe. > - Change to pipe_slow on TSO (as x86). > > Note that C2-rv64 is now superior to gcc/clang regrading fencing: > https://godbolt.org/z/6E3YTP15j > > Testing jcstress, tier1 and manually reading the generated assembly. > Doing additional testing, but RFR it now as it may need some consideration. > > /Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'master' into tso-merge - Merge branch 'master' into tso-merge - Merge branch 'master' into tso-merge - format comment - Merge branch 'master' into tso-merge - Review comments - Merge branch 'master' into tso-merge - Review comments - Fixed ws - Revert NC - ... and 2 more: https://git.openjdk.org/jdk/compare/3a3bcdf3...f09ea739 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24035/files - new: https://git.openjdk.org/jdk/pull/24035/files/c2688a6a..f09ea739 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=07-08 Stats: 17485 lines in 369 files changed: 10145 ins; 6085 del; 1255 mod Patch: https://git.openjdk.org/jdk/pull/24035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24035/head:pull/24035 PR: https://git.openjdk.org/jdk/pull/24035 From shade at openjdk.org Wed Apr 2 19:10:47 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Apr 2025 19:10:47 GMT Subject: RFR: 8353572: x86: AMD platforms miss the check for CLWB feature flag In-Reply-To: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> References: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> Message-ID: On Wed, 2 Apr 2025 17:49:30 GMT, Aleksey Shipilev wrote: > Noticed this when doing [JDK-8353558](https://bugs.openjdk.org/browse/JDK-8353558). We only check for CLWB feature flag for Intel platforms. But AMD APM (Rev. 3.36?March 2024) tells me there is a CLWB flag in CPUID Fn0000_0007_EBX_x0 leaf as well. It is in the same place as the flag for Intel. > > Additional testing: > - [x] Ad-hoc tests on Ryzen 5950X @adinn, this is your code for JEP 352, originally. Simple omission? Or maybe AMD arch did not map this flag to any feature back in the days? I think this is started with Family 17h or even Zen 2. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24385#issuecomment-2773461404 From sgehwolf at openjdk.org Wed Apr 2 19:28:02 2025 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 2 Apr 2025 19:28:02 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v5] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 21:33:30 GMT, Thomas Fitzsimmons wrote: >> This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. >> >> I tested it with: >> >> >> java -Xlog:os+container=trace -version >> >> on: >> >> `Red Hat Enterprise Linux 8 (cgroups v1 only)`: >> _No change in behaviour_ >> >> `Fedora 41 (cgroups v2)`: >> _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ >> >> --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 >> +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 >> @@ -1,7 +1,12 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> +[debug][os,container] v2 controller cpuset is enabled and relevant >> +[debug][os,container] v2 controller cpu is enabled and required >> +[debug][os,container] v2 controller io is enabled but not relevant >> +[debug][os,container] v2 controller memory is enabled and required >> +[debug][os,container] v2 controller hugetlb is enabled but not relevant >> +[debug][os,container] v2 controller pids is enabled and relevant >> +[debug][os,container] v2 controller rdma is enabled but not relevant >> +[debug][os,container] v2 controller misc is enabled but not relevant >> [debug][os,container] Detected cgroups v2 unified hierarchy >> [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope >> [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max >> >> >> `Fedora 41 (custom kernel with cgroups v1 disabled)`: >> _Fixes `cgroups v2` detection:_ >> >> --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 >> +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 >> @@ -1,7 +1,63 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> -[debug][os,container] controller memory is not enabled >> - ] >> -[debug][os,container] One or more required controllers disabled at kernel level. >> +[... > > Thomas Fitzsimmons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 > - testCgroupv1SystemdOnly, testCgroupv1NoMounts: Use cgroupv1 fields > - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 > - Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent > - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing > > Remove from cgroups v1 branch incorrect log messages about cpuset > controller being optional. Add test case for cgroups v1, cpuset > disabled. > - Improve !cgroups_v2_enabled branch comment > - Debug-log optional and disabled cgroups v2 controllers > > Do not log enabled controllers that are not relevant to the JDK. > - Move index declaration to scope in which it is used > - Remove empty string check during cgroup.controllers parsing > - Define ISSPACE_CHARS macro, use it in strsep call > - ... and 5 more: https://git.openjdk.org/jdk/compare/3745148c...b29d8694 Still good. ------------- Marked as reviewed by sgehwolf (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23811#pullrequestreview-2737343220 From vlivanov at openjdk.org Wed Apr 2 19:42:57 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 2 Apr 2025 19:42:57 GMT Subject: RFR: 8353217: Build libsleef on macos-aarch64 [v3] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 22:45:09 GMT, Vladimir Ivanov wrote: >> Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. >> >> It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. >> >> PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. >> >> Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. >> >> Testing: hs-tier1 - hs-tier4, microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Adjust README.md Thanks for the feedback and reviews, Julian, Vladimir, Aleksey, Magnus, and Erik. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2773549083 From duke at openjdk.org Wed Apr 2 19:43:02 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Wed, 2 Apr 2025 19:43:02 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v4] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 13:39:53 GMT, Severin Gehwolf wrote: >> @tstuefe @ashu-mehra Could you please help with a second review? > >> @jerboaa @fitzsim Does the current mainline code handles mixed configuration where in some controllers are v1 and others v2? For example cpu controller is mounted as v1 while memory controller as v2. If yes, does this patch continue to support such configuration? > > The current code does not allow mixed configuration for "relevant" controllers (essentially cpu and memory). That is, they ought to be v1 or v2. In the hybrid case (systemd running on unified), it's considered v1. I don't think this patch changes any of it. Thank you for re-reviewing, @jerboaa and @ashu-mehra. I have issued the `integrate` command. Can one of you please sponsor the change? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23811#issuecomment-2773547508 From duke at openjdk.org Wed Apr 2 19:43:02 2025 From: duke at openjdk.org (duke) Date: Wed, 2 Apr 2025 19:43:02 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups [v5] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 21:33:30 GMT, Thomas Fitzsimmons wrote: >> This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. >> >> I tested it with: >> >> >> java -Xlog:os+container=trace -version >> >> on: >> >> `Red Hat Enterprise Linux 8 (cgroups v1 only)`: >> _No change in behaviour_ >> >> `Fedora 41 (cgroups v2)`: >> _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ >> >> --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 >> +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 >> @@ -1,7 +1,12 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> +[debug][os,container] v2 controller cpuset is enabled and relevant >> +[debug][os,container] v2 controller cpu is enabled and required >> +[debug][os,container] v2 controller io is enabled but not relevant >> +[debug][os,container] v2 controller memory is enabled and required >> +[debug][os,container] v2 controller hugetlb is enabled but not relevant >> +[debug][os,container] v2 controller pids is enabled and relevant >> +[debug][os,container] v2 controller rdma is enabled but not relevant >> +[debug][os,container] v2 controller misc is enabled but not relevant >> [debug][os,container] Detected cgroups v2 unified hierarchy >> [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope >> [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max >> >> >> `Fedora 41 (custom kernel with cgroups v1 disabled)`: >> _Fixes `cgroups v2` detection:_ >> >> --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 >> +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 >> @@ -1,7 +1,63 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> -[debug][os,container] controller memory is not enabled >> - ] >> -[debug][os,container] One or more required controllers disabled at kernel level. >> +[... > > Thomas Fitzsimmons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 > - testCgroupv1SystemdOnly, testCgroupv1NoMounts: Use cgroupv1 fields > - Merge branch 'master' into cgroups-v2-version-check-and-controllers-parsing-1 > - Replace literal tabs in procCgroupsCgroupsV1CpusetDisabledContent > - Detect cpuset-disabled condition during cgroups v1 /proc/cgroups parsing > > Remove from cgroups v1 branch incorrect log messages about cpuset > controller being optional. Add test case for cgroups v1, cpuset > disabled. > - Improve !cgroups_v2_enabled branch comment > - Debug-log optional and disabled cgroups v2 controllers > > Do not log enabled controllers that are not relevant to the JDK. > - Move index declaration to scope in which it is used > - Remove empty string check during cgroup.controllers parsing > - Define ISSPACE_CHARS macro, use it in strsep call > - ... and 5 more: https://git.openjdk.org/jdk/compare/13207d93...b29d8694 @fitzsim Your change (at version b29d869457ad578d7442959da9b1169f43d0fee0) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23811#issuecomment-2773541695 From vlivanov at openjdk.org Wed Apr 2 19:45:59 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 2 Apr 2025 19:45:59 GMT Subject: Integrated: 8353217: Build libsleef on macos-aarch64 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 00:58:59 GMT, Vladimir Ivanov wrote: > Build and use SLEEF library as a backend implementation for Vector API trigonometric functions on macosx-aarch64 platform. > > It improves raw throughput and eliminates GC overhead of non-intrinsified Vector API operation. > > PR includes build changes and libsleef sources relocation from `src/jdk.incubator.vector/linux/native/` to `src/jdk.incubator.vector/share/native/`. > > Once libsleef library is present, existing code in `stubGenerator_aarch64.cpp` successfully links at JVM startup. > > Testing: hs-tier1 - hs-tier4, microbenchmarks This pull request has now been integrated. Changeset: 130b0cda Author: Vladimir Ivanov URL: https://git.openjdk.org/jdk/commit/130b0cdaa6604da47a893e5425547acf3d5253f4 Stats: 167 lines in 176 files changed: 73 ins; 77 del; 17 mod 8353217: Build libsleef on macos-aarch64 Co-authored-by: Magnus Ihse Bursie Reviewed-by: erikj, kvn, ihse ------------- PR: https://git.openjdk.org/jdk/pull/24306 From mdoerr at openjdk.org Wed Apr 2 20:40:00 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 2 Apr 2025 20:40:00 GMT Subject: RFR: JDK-8331859 : [PPC64] Remove support for Power7 and older In-Reply-To: References: Message-ID: On Fri, 19 Jul 2024 17:32:09 GMT, Suchismith Roy wrote: > JBS Issue: [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859) > Linux PPC64le requires Power8 since the beginning. > AIX requires Power8 with the new OpenXL based build ([JDK-8307520](https://bugs.openjdk.org/browse/JDK-8307520)). The old build has been removed in JDK 23 ([JDK-8327701](https://bugs.openjdk.org/browse/JDK-8327701)). > Linux PPC64 Big Endian is no longer officially supported (only kept alive for development, debugging and testing purposes). > > The following checks for old processors are no longer needed: > 8: VM_Version::has_lqarx() > 7: VM_Version::has_popcntw() > 6: VM_Version::has_cmpb() > 5: VM_Version::has_popcntb() > These ones and some more checks for old instructions are no longer needed. All code which is no longer reachable when removing them should also get removed. > Checks like "PowerArchitecturePPC64 >= 8" (or older) can be removed. > > Atomic::PlatformCmpxchg<1>::operator() can be simplified by using sub-word instructions (lharx, lbarx). > > Temp registers can be removed from cmpxchgb and cmpxchgh. > > Build flags "-mcpu=powerpc64 -mtune=power5" for Big Endian linux should get replaced by "-mcpu=power8 -mtune=power8" as already used for linux PPC64le. make/autoconf/flags-cflags.m4 line 718: > 716: # -mminimal-toc fixes `relocation truncated to fit' error for gcc 4.1. > 717: # Use ppc64 instructions, but schedule for power5 > 718: $1_CFLAGS_CPU="-mcpu=powerpc64 -mtune=power8" -mcpu=power8 is missing. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 1586: > 1584: // emulate the sub-word instructions by constructing a 4-byte value > 1585: // that leaves the other bytes unchanged. > 1586: const int instruction_type = size; I think it would be better to remove instruction_type and replace the remaining usages. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 1658: > 1656: void MacroAssembler::cmpxchg_loop_body(ConditionRegister flag, Register dest_current_value, > 1657: RegisterOrConstant compare_value, Register exchange_value, > 1658: Register addr_base, Label &retry, Label &failed, bool cmpxchgx_hint, int size) { Please restore indentation. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 1663: > 1661: // emulate the sub-word instructions by constructing a 4-byte value > 1662: // that leaves the other bytes unchanged. > 1663: const int instruction_type = size; Same here. src/hotspot/cpu/ppc/vm_version_ppc.cpp line 99: > 97: > 98: if (FLAG_IS_DEFAULT(UsePopCountInstruction)) { > 99: FLAG_SET_ERGO(UsePopCountInstruction, true); Indentation should be adapted. Or even better set the default in globals_ppc.hpp to true and simplify this logic. src/hotspot/cpu/ppc/vm_version_ppc.cpp line 107: > 105: > 106: if (FLAG_IS_DEFAULT(SuperwordUseVSX)) { > 107: FLAG_SET_ERGO(SuperwordUseVSX, true); Same here. src/hotspot/cpu/ppc/vm_version_ppc.hpp line 113: > 111: static bool has_fcfids() { return (_features & fcfids_m) != 0; } > 112: static bool has_vand() { return (_features & vand_m) != 0; } > 113: static bool has_lqarx() { return (_features & lqarx_m) != 0; } Why are the other Power7 and older instruction checks not removed? src/hotspot/os_cpu/linux_ppc/atomic_linux_ppc.hpp line 255: > 253: /* atomic loop */ > 254: "1: \n" > 255: " lbarx %[old_value], 0, %[dest] \n" Please adapt the indentation of the modified lines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20262#discussion_r2025531994 PR Review Comment: https://git.openjdk.org/jdk/pull/20262#discussion_r2025542835 PR Review Comment: https://git.openjdk.org/jdk/pull/20262#discussion_r2025533312 PR Review Comment: https://git.openjdk.org/jdk/pull/20262#discussion_r2025545860 PR Review Comment: https://git.openjdk.org/jdk/pull/20262#discussion_r2025550882 PR Review Comment: https://git.openjdk.org/jdk/pull/20262#discussion_r2025554086 PR Review Comment: https://git.openjdk.org/jdk/pull/20262#discussion_r2025558380 PR Review Comment: https://git.openjdk.org/jdk/pull/20262#discussion_r2025555814 From ccheung at openjdk.org Wed Apr 2 20:50:49 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 2 Apr 2025 20:50:49 GMT Subject: RFR: 8353325: Rewrite appcds/methodHandles test cases to use CDSAppTester [v4] In-Reply-To: References: Message-ID: <6b83a-GMDNuvXOPq6Hs2SVTlB8hMoKNjGq0w-HE7XTQ=.5abaa98c-8a06-4b8a-8d43-6130a8ceae80@github.com> On Wed, 2 Apr 2025 03:49:05 GMT, Ioi Lam wrote: >> These test cases are rewritten to use CDSAppTester, so that they can also be executed in the new JEP 483 workflow (with `-XX:AOTCache=xxx`, etc). This will increase coverage of current and upcoming AOT features (such as AOT linking of invokedynamic, and AOT method profiling). >> >> These test cases are generated by a bash script. This PR minimizes the generated part so that the main portions of the tests can be modified as a normal java source file. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Fixed failure with ZGC + AOT test case Updates look good. ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24340#pullrequestreview-2737545424 From gziemski at openjdk.org Wed Apr 2 22:16:51 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 2 Apr 2025 22:16:51 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v8] In-Reply-To: <5b6QutsBRX4KGZY50opOtRztczGFdilIy3CwlisJ4s4=.7b8a3f1a-4f6f-469f-829b-1d693ad1355d@github.com> References: <5b6QutsBRX4KGZY50opOtRztczGFdilIy3CwlisJ4s4=.7b8a3f1a-4f6f-469f-829b-1d693ad1355d@github.com> Message-ID: On Wed, 2 Apr 2025 18:37:36 GMT, Gerard Ziemski wrote: >> This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. >> >> I tried to fill in tag, when I was pretty certain that I had the right type. >> >> At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > The real feedback from StefanK I just ran MAch5 tier1-5 and from what I can see I think it passes (see the issue for more on that). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24282#issuecomment-2773850553 From dnsimon at openjdk.org Wed Apr 2 22:32:58 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 2 Apr 2025 22:32:58 GMT Subject: RFR: 8352645: Add tool support to check order of includes [v6] In-Reply-To: References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Fri, 28 Mar 2025 22:24:40 GMT, Doug Simon wrote: >> This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). >> >> By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. >> >> The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. >> >> I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. >> >> When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: >> >> java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: >> >> java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci >> >> at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) >> at java.base/java.lang.Thread.run(Thread.java:1447) >> Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: >> >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp >> /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp >> /Users/dnsimo... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > convert Windows path to Unix path Thanks for all the discussion and reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24247#issuecomment-2773874524 From dnsimon at openjdk.org Wed Apr 2 22:32:59 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 2 Apr 2025 22:32:59 GMT Subject: Integrated: 8352645: Add tool support to check order of includes In-Reply-To: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> References: <2R1Lazv-rFiErR_ZtJjyT77Wm2XeaKQ8hA5HDg8o1v4=.084054ad-a46b-4206-bc1e-5e9d2bdbaaa2@github.com> Message-ID: On Wed, 26 Mar 2025 09:21:59 GMT, Doug Simon wrote: > This PR adds `test/hotspot/jtreg/sources/SortIncludes.java`, a tool to check that blocks of include statements in C++ files are sorted and that there's at least one blank line between user and sys includes (as per the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#source-files)). > > By virtue of using `SortedSet`, the tool also removes duplicate includes (e.g. `"compiler/compilerDirectives.hpp"` on line [37](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L37) and line [41](https://github.com/openjdk/jdk/blob/059f190f4b0c7836b89ca2070400529e8d33790b/src/hotspot/share/c1/c1_Compilation.cpp#L41)). Sorting uses lowercased strings so that `_` sorts before letters, preserving the prevailing convention in the code base. I've also updated the style guide to clarify this sort-order. > > The tool does nothing about re-ordering blocks of conditional includes vs unconditional includes. I briefly looked into that but it gets very complicated, very quickly. That kind of re-ordering will have to continue to be done manually for now. > > I have used the tool to fix the ordering of a subset of HotSpot sources and added a test to keep them sorted. That test can be expanded over time to keep includes sorted in other HotSpot directories. > > When `TestIncludesAreSorted.java` fails, it tries to provide actionable advice. For example: > > java.lang.RuntimeException: The unsorted includes listed below should be fixable by running: > > java /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/sources/SortIncludes.java --update /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1 /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/ci /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/compiler /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/jvmci > > at TestIncludesAreSorted.main(TestIncludesAreSorted.java:80) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) > at java.base/java.lang.Thread.run(Thread.java:1447) > Caused by: java.lang.RuntimeException: 36 files with unsorted headers found: > > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Compilation.cpp > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Runtime1.cpp > /Users/dnsimon/dev/jdk-jdk/open/src/hotspot/share/c1/c1_Optim... This pull request has now been integrated. Changeset: 814730ea Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/814730eae76d7b60a6082dc6f9e30618b7d8524b Stats: 486 lines in 53 files changed: 407 ins; 55 del; 24 mod 8352645: Add tool support to check order of includes Reviewed-by: stefank, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/24247 From iklam at openjdk.org Thu Apr 3 00:44:59 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 3 Apr 2025 00:44:59 GMT Subject: RFR: 8353325: Rewrite appcds/methodHandles test cases to use CDSAppTester [v4] In-Reply-To: <6b83a-GMDNuvXOPq6Hs2SVTlB8hMoKNjGq0w-HE7XTQ=.5abaa98c-8a06-4b8a-8d43-6130a8ceae80@github.com> References: <6b83a-GMDNuvXOPq6Hs2SVTlB8hMoKNjGq0w-HE7XTQ=.5abaa98c-8a06-4b8a-8d43-6130a8ceae80@github.com> Message-ID: On Wed, 2 Apr 2025 20:48:27 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed failure with ZGC + AOT test case > > Updates look good. Thanks @calvinccheung for the review. I've re-run up to tier 3 and found no regressions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24340#issuecomment-2774053529 From iklam at openjdk.org Thu Apr 3 00:45:00 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 3 Apr 2025 00:45:00 GMT Subject: Integrated: 8353325: Rewrite appcds/methodHandles test cases to use CDSAppTester In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 23:08:13 GMT, Ioi Lam wrote: > These test cases are rewritten to use CDSAppTester, so that they can also be executed in the new JEP 483 workflow (with `-XX:AOTCache=xxx`, etc). This will increase coverage of current and upcoming AOT features (such as AOT linking of invokedynamic, and AOT method profiling). > > These test cases are generated by a bash script. This PR minimizes the generated part so that the main portions of the tests can be modified as a normal java source file. This pull request has now been integrated. Changeset: b01026ab Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/b01026abaab0b65f9ec0920d66a8ff1fa868d351 Stats: 1639 lines in 18 files changed: 381 ins; 1214 del; 44 mod 8353325: Rewrite appcds/methodHandles test cases to use CDSAppTester Reviewed-by: ccheung ------------- PR: https://git.openjdk.org/jdk/pull/24340 From cslucas at openjdk.org Thu Apr 3 00:57:19 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 3 Apr 2025 00:57:19 GMT Subject: RFR: 8353593: MethodData "mileage_*" methods and fields aren't used and can be removed Message-ID: Please review this trivial patch to remove dead code from MethodData class. Tested on Linux x86_64 with JTREG_TIER1. ------------- Commit messages: - Mileage fields & methods aren't used anymore. Changes: https://git.openjdk.org/jdk/pull/24399/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24399&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353593 Stats: 15 lines in 2 files changed: 0 ins; 13 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24399.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24399/head:pull/24399 PR: https://git.openjdk.org/jdk/pull/24399 From lucy at openjdk.org Thu Apr 3 06:09:53 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 3 Apr 2025 06:09:53 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v11] In-Reply-To: References: Message-ID: <_6j8quJGSKhHMpItf4G2RFhkBEXhcuSJ_IzJEM1soqE=.a36655af-1fbc-4f5d-a7b8-bddca0d08071@github.com> On Wed, 19 Mar 2025 10:17:56 GMT, Amit Kumar wrote: >> s390x implementation for Class.isInstance intrinsic. >> >> Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. >> >> Benchmark results will be updated soon. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestion from Lutz Still looking good to me. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23535#pullrequestreview-2738623699 From jwaters at openjdk.org Thu Apr 3 06:21:01 2025 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 3 Apr 2025 06:21:01 GMT Subject: RFR: 8345265: Minor improvements for LTO across all compilers [v2] In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 14:54:03 GMT, Julian Waters wrote: >> This is a general cleanup and improvement of LTO, as well as a quick fix to remove a workaround in the Makefiles that disabled LTO for g1ParScanThreadState.cpp due to the old poisoning mechanism causing trouble. The -Wno-attribute-warning change here can be removed once Kim's new poisoning solution is integrated. >> >> - -fno-omit-frame-pointer is added to gcc to stop the linker from emitting code without the frame pointer >> - -flto is set to $(JOBS) instead of auto to better match what the user requested >> - -Gy is passed to the Microsoft compiler. This does not fully fix LTO under Microsoft, but prevents warnings about -LTCG:INCREMENTAL at least > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge branch 'openjdk:master' into patch-16 > - -fno-omit-frame-pointer in JvmFeatures.gmk > - Revert compilerWarnings_gcc.hpp > - General LTO fixes JvmFeatures.gmk > - Revert DISABLE_POISONING_STOPGAP compilerWarnings_gcc.hpp > - Merge branch 'openjdk:master' into patch-16 > - Revert os.cpp > - Fix memory leak in jvmciEnv.cpp > - Stopgap fix in os.cpp > - Declaration fix in compilerWarnings_gcc.hpp > - ... and 2 more: https://git.openjdk.org/jdk/compare/873ad932...9d05cb8e Argh, so it can't be replicated on Linux. Alright, I've troubled you enough by now, thank you so much for the help in testing Linux! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22464#issuecomment-2774604658 From rehn at openjdk.org Thu Apr 3 06:21:57 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 3 Apr 2025 06:21:57 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v8] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Wed, 2 Apr 2025 13:16:05 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Merge branch 'master' into tso-merge >> - Merge branch 'master' into tso-merge >> - format comment >> - Merge branch 'master' into tso-merge >> - Review comments >> - Merge branch 'master' into tso-merge >> - Review comments >> - Fixed ws >> - Revert NC >> - Fixed comment >> - ... and 1 more: https://git.openjdk.org/jdk/compare/18d961e2...c2688a6a > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3558: > >> 3556: >> 3557: void MacroAssembler::membar(uint32_t order_constraint) { >> 3558: if (UseZtso && ((order_constraint & StoreLoad) != StoreLoad)) { > > An assert in Assembler::fence() could help to catch potential misuse in the future: > `assert(!UseZtso || ((order_constraint & StoreLoad) == StoreLoad)` Sorry, I now understand, you mean after doing all these checks if we can elide. Yes, that seems good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2026281974 From lucy at openjdk.org Thu Apr 3 06:23:52 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 3 Apr 2025 06:23:52 GMT Subject: RFR: 8350182: [s390x] Relativize locals in interpreter frames [v2] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 13:18:38 GMT, Amit Kumar wrote: >> Port for [JDK-8299795](https://bugs.openjdk.org/browse/JDK-8299795) Relativize Z_locals in interpreter frame for s390x. >> >> Tier1 test with fastdebug vm are clean. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > use Z_R0 as helper Changes requested by lucy (Reviewer). src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp line 1145: > 1143: __ z_stg(Z_R1, _z_ijava_state_neg(locals), fp); > 1144: > 1145: __ z_lgr(Z_R1, Z_R0); // restore R1 Why don't you use Z_R0 for the relativation calculations and leave Z_R1 untouched? That will avoid the save/restore overhead. ------------- PR Review: https://git.openjdk.org/jdk/pull/23660#pullrequestreview-2738650289 PR Review Comment: https://git.openjdk.org/jdk/pull/23660#discussion_r2026285061 From rehn at openjdk.org Thu Apr 3 06:28:52 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 3 Apr 2025 06:28:52 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v8] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Wed, 2 Apr 2025 13:17:02 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Merge branch 'master' into tso-merge >> - Merge branch 'master' into tso-merge >> - format comment >> - Merge branch 'master' into tso-merge >> - Review comments >> - Merge branch 'master' into tso-merge >> - Review comments >> - Fixed ws >> - Revert NC >> - Fixed comment >> - ... and 1 more: https://git.openjdk.org/jdk/compare/d83e8386...c2688a6a > > src/hotspot/cpu/riscv/riscv.ad line 7951: > >> 7949: %} >> 7950: >> 7951: instruct unnecessary_membar_volatile_rvtso() %{ > > This one could be merged with `unnecessary_membar_volatile_rvwmo`, and remove the `UseZtso` in predicate. There are several more which can be merge, such as membar_volatile_rvXX. But I prefered having those for rvtso in one section and those for rvwmo in another section. That is not a good approach? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2026291366 From rehn at openjdk.org Thu Apr 3 06:33:02 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 3 Apr 2025 06:33:02 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v8] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Thu, 3 Apr 2025 06:18:24 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3558: >> >>> 3556: >>> 3557: void MacroAssembler::membar(uint32_t order_constraint) { >>> 3558: if (UseZtso && ((order_constraint & StoreLoad) != StoreLoad)) { >> >> An assert in Assembler::fence() could help to catch potential misuse in the future: >> `assert(!UseZtso || ((order_constraint & StoreLoad) == StoreLoad)` > > Sorry, I now understand, you mean after doing all these checks if we can elide. > > Yes, that seems good. It's not possible as rv pause is encoded as "fence w, 0". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2026295481 From rehn at openjdk.org Thu Apr 3 06:36:10 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 3 Apr 2025 06:36:10 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v10] In-Reply-To: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: > Hi please consider. > > |RVWMO| Patched| > | ---------- | ---------- | > |fence iorw,iorw| fence iorw,ow| > |sw t4,120(t2) | sw t4,120(t2) | > |fence ow,ir | unnecessary_membar_volatile_rvwmo | > | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | > |fence iorw,ow | fence iorw,ow| > |sw t5,124(t2) |sw t5,124(t2) | > > |TSO | Patched| > | ---------- | ---------- | > | lw a4,120(t2) | lw a6,120(t2) | > | sw a0,124(t2) | sw t6,124(t2) | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > | sw t4,120(t2) | sw t4,120(t2) | > | fence ow,ir | unnecessary_membar_volatile_tso | > | sw t6,128(t2) | sw t5,128(t2) | > | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > |... | ... | > | sw a3,120(t2) | sw a0,120(t2) | > | fence ow,ir | fence ow,ir | > | lw a7,124(t2) | lw a5,124(t2) | > > For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. > > The patch do: > - Separate ztso and rvwmo in ad by using UseZtso predicate. > - Match all that requires the same membar. > - Make fence/fencei protected as they shouldn't be using directly. > - Increased cost of membars to VOLATILE_REF_COST. > - Added a real_empty pipe. > - Change to pipe_slow on TSO (as x86). > > Note that C2-rv64 is now superior to gcc/clang regrading fencing: > https://godbolt.org/z/6E3YTP15j > > Testing jcstress, tier1 and manually reading the generated assembly. > Doing additional testing, but RFR it now as it may need some consideration. > > /Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'master' into tso-merge - Merge branch 'master' into tso-merge - Merge branch 'master' into tso-merge - Merge branch 'master' into tso-merge - format comment - Merge branch 'master' into tso-merge - Review comments - Merge branch 'master' into tso-merge - Review comments - Fixed ws - ... and 3 more: https://git.openjdk.org/jdk/compare/652f256a...2044cf5f ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24035/files - new: https://git.openjdk.org/jdk/pull/24035/files/f09ea739..2044cf5f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24035&range=08-09 Stats: 5163 lines in 312 files changed: 2801 ins; 1884 del; 478 mod Patch: https://git.openjdk.org/jdk/pull/24035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24035/head:pull/24035 PR: https://git.openjdk.org/jdk/pull/24035 From dholmes at openjdk.org Thu Apr 3 07:33:49 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 3 Apr 2025 07:33:49 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:56:31 GMT, Matthias Baesken wrote: > Do you have a good example of such a one-of periodic task? Not existing in mainline. The last one-of PeriodicTask was a BiasedLocking task: // One-shot PeriodicTask subclass for enabling biased locking class EnableBiasedLockingTask : public PeriodicTask { public: EnableBiasedLockingTask(size_t interval_time) : PeriodicTask(interval_time) {} virtual void task() { VM_EnableBiasedLocking op; VMThread::execute(&op); // Reclaim our storage and disenroll ourself delete this; } }; ... EnableBiasedLockingTask* task = new EnableBiasedLockingTask(BiasedLockingStartupDelay); task->enroll(); In this case we'd use a very short `interval_time` as we are not really trying to delay, and we would do this somewhere early in `create_vm` - I think you can do this before the WatcherThread is created, but not sure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2774738247 From dholmes at openjdk.org Thu Apr 3 07:55:55 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 3 Apr 2025 07:55:55 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 10:59:16 GMT, Kevin Walls wrote: > We should be consistent, and run all OnError items in a new shell. Currently the ; separator causes a new shell, but multiple -XX:OnError= options are grouped into the same shell. > > next_OnError_command() decides on where a new command starts. It should recognise newlines, and all commands will get their own shell. But if anyone is using this for a sequence of commands that they expect to be in the same shell, they will now be broken. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24354#issuecomment-2774787552 From mbaesken at openjdk.org Thu Apr 3 08:24:08 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 3 Apr 2025 08:24:08 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 12:21:42 GMT, David Holmes wrote: > I'm not convinced the startup hit is justified - some filesystems are relatively very slow. I thought about it a little more ; you read the little release file from the JDK image itself. You read from the same file system location e.g. the libjvm.so (and other stuff that is mandatory when starting up the JVM). The libjvm is (depending on OS/build options used) 7.000 or 10.000 times larger than the release file. On my example build (Linux x86_64) e.g. 4K vs 28M . So the load should not really add up much . If we really care that much about those few bytes , we should invest WAY more into size-reduction of the JDK image (especially the early loaded files). (not saying that I don't like the PeriodicTask idea, I'll try this out too) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2774859937 From duke at openjdk.org Thu Apr 3 08:42:04 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Thu, 3 Apr 2025 08:42:04 GMT Subject: Integrated: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 21:03:58 GMT, Thomas Fitzsimmons wrote: > This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. > > I tested it with: > > > java -Xlog:os+container=trace -version > > on: > > `Red Hat Enterprise Linux 8 (cgroups v1 only)`: > _No change in behaviour_ > > `Fedora 41 (cgroups v2)`: > _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ > > --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 > +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 > @@ -1,7 +1,12 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 controller cpu is enabled and required > +[debug][os,container] v2 controller io is enabled but not relevant > +[debug][os,container] v2 controller memory is enabled and required > +[debug][os,container] v2 controller hugetlb is enabled but not relevant > +[debug][os,container] v2 controller pids is enabled and relevant > +[debug][os,container] v2 controller rdma is enabled but not relevant > +[debug][os,container] v2 controller misc is enabled but not relevant > [debug][os,container] Detected cgroups v2 unified hierarchy > [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope > [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max > > > `Fedora 41 (custom kernel with cgroups v1 disabled)`: > _Fixes `cgroups v2` detection:_ > > --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 > +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 > @@ -1,7 +1,63 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > -[debug][os,container] controller memory is not enabled > - ] > -[debug][os,container] One or more required controllers disabled at kernel level. > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 contro... This pull request has now been integrated. Changeset: 9c5ed23e Author: Thomas Fitzsimmons Committer: Severin Gehwolf URL: https://git.openjdk.org/jdk/commit/9c5ed23eac7470f56d498e9c4d3c51c2f80fd571 Stats: 385 lines in 6 files changed: 291 ins; 23 del; 71 mod 8349988: Change cgroup version detection logic to not depend on /proc/cgroups 8347811: Container detection code for cgroups v2 should use cgroup.controllers Co-authored-by: Severin Gehwolf Reviewed-by: sgehwolf, asmehra ------------- PR: https://git.openjdk.org/jdk/pull/23811 From kevinw at openjdk.org Thu Apr 3 08:53:53 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 3 Apr 2025 08:53:53 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 07:53:38 GMT, David Holmes wrote: > But if anyone is using this for a sequence of commands that they expect to be in the same shell, they will now be broken. I haven't worked out an actual example of a dependency... Our sh -c call doesn't seem to accept setting an env var which the next line from OnError an option can use. It might be the same as: $ sh -c "FOO=123; echo $FOO" $ sh -c 'FOO=123; echo $FOO' 123 ..where the first must be expanding $FOO at the start, so sees nothing. We don't want to change anything to try and make that work. We document the ; separator, but: -XX:OnError="export FOO=%p; echo $FOO" ...gets separate shells already to they can't depend on each other. I don't see us actually documenting that you can use multiple -XX:OnError= commands. It has been mentioned in at least one earlier JBS issue that we can do this, and that they accumulate (possibly your comment). Creating a file in one command and doing something to it in a second command won't be affected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24354#issuecomment-2774934684 From ihse at openjdk.org Thu Apr 3 09:09:49 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 3 Apr 2025 09:09:49 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 08:20:56 GMT, Matthias Baesken wrote: > So the load should not really add up much . You are most likely absolutely right. On an old-style physical hard drive, there might be some additional delay, assuming that the files themselves are contiguous, but not next to each other. I doubt there is any way whatsoever that you could should any performance impact by doing this, it will completely drown in the noise. Normally, I live and die by the Rules of Optimizations: 1) Don't optimize. 2) (Only for experts) Don't optimize yet. That is, trying to guess if a piece of code would be bad for performance, without measuring, and writing worse code as a result of this guess, is a really bad engineering principle. But I also understand David's gut reaction. Hotspot (and the entire JDK) is a complex piece of software that has a high level of performance requirements, and there are a lot of "small cuts" that can overall lead to worse performance, even if every one of them is hard to measure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2774987280 From rvansa at openjdk.org Thu Apr 3 09:14:53 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Thu, 3 Apr 2025 09:14:53 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 12:02:58 GMT, Radim Vansa wrote: >> On the reproducer https://bugs.openjdk.org/secure/attachment/113985/CCC.java my local testing shows these numbers: >> >> ### JDK-17 >> >> $ hyperfine -w 5 -r 10 '/path/to/jdk-17/bin/java -cp /tmp/ CCC' >> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp/ CCC >> Time (mean ? ?): 32.5 ms ? 0.9 ms [User: 27.5 ms, System: 10.6 ms] >> Range (min ? max): 31.1 ms ? 33.7 ms 10 runs >> >> ### JDK-25 before the change applied >> >> $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' >> Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC >> Time (mean ? ?): 101.6 ms ? 1.5 ms [User: 96.8 ms, System: 14.6 ms] >> Range (min ? max): 99.0 ms ? 104.5 ms 10 runs >> >> ### JDK-25 with this patch >> >> $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' >> Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC >> Time (mean ? ?): 75.8 ms ? 1.2 ms [User: 69.8 ms, System: 16.0 ms] >> Range (min ? max): 73.8 ms ? 78.2 ms 10 runs > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Fix compilation error in assertion @shipilev Since you were already involved in investigation of https://bugs.openjdk.org/browse/JDK-8352075 may I ask your review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24290#issuecomment-2774998582 From mli at openjdk.org Thu Apr 3 09:27:06 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 3 Apr 2025 09:27:06 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v8] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Thu, 3 Apr 2025 06:26:34 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/riscv.ad line 7951: >> >>> 7949: %} >>> 7950: >>> 7951: instruct unnecessary_membar_volatile_rvtso() %{ >> >> This one could be merged with `unnecessary_membar_volatile_rvwmo`, and remove the `UseZtso` in predicate. > > There are several more which can be merge, such as membar_volatile_rvXX. > But I prefered having those for rvtso in one section and those for rvwmo in another section. > That is not a good approach? Seems to me it's better to merge these instruct. e..g when read the code, one needs to check what's the difference between these instructs, but found out they're exactly the same ones. Maybe we could have 3 sections? // RVTSO ... // shared between RVTSO and RVWMO ... // RVWMO ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2026573726 From mli at openjdk.org Thu Apr 3 09:27:03 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 3 Apr 2025 09:27:03 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v8] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: <4Ls-4wY2muJfT3veXQFJsLkrZgR9iKbZ6wAPd4SJ1Hc=.7212fb6a-f160-409a-b40e-7fc0358a4e68@github.com> On Thu, 3 Apr 2025 06:30:18 GMT, Robbin Ehn wrote: >> Sorry, I now understand, you mean after doing all these checks if we can elide. >> >> Yes, that seems good. > > It's not possible as rv pause is encoded as "fence w, 0". I see, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2026572204 From rehn at openjdk.org Thu Apr 3 09:57:49 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 3 Apr 2025 09:57:49 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v8] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Thu, 3 Apr 2025 09:24:46 GMT, Hamlin Li wrote: >> There are several more which can be merge, such as membar_volatile_rvXX. >> But I prefered having those for rvtso in one section and those for rvwmo in another section. >> That is not a good approach? > > Seems to me it's better to merge these instruct. e..g when read the code, one needs to check what's the difference between these instructs, but found out they're exactly the same ones. > Maybe we could have 3 sections? > > // RVTSO > ... > // shared between RVTSO and RVWMO > ... > // RVWMO > ... @feilongjiang and @RealFYang what do you think ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2026630990 From jkern at openjdk.org Thu Apr 3 10:23:55 2025 From: jkern at openjdk.org (Joachim Kern) Date: Thu, 3 Apr 2025 10:23:55 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 14:02:25 GMT, Robert Toyonaga wrote: >> ### Summary: >> This PR makes memory operations atomic with NMT accounting. >> >> ### The problem: >> In memory related functions like `os::commit_memory` and `os::reserve_memory` the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in `MemTracker`, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races. >> >> 1.1 Thread_1 releases range_A. >> 1.2 Thread_1 tells NMT "range_A has been released". >> >> 2.1 Thread_2 reserves (the now free) range_A. >> 2.2 Thread_2 tells NMT "range_A is reserved". >> >> Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS. >> >> ### Solution: >> Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself. >> >> ### Other notes: >> I also simplified this pattern found in many places: >> >> if (MemTracker::enabled()) { >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_some_operation(addr, bytes); >> if (result != nullptr) { >> MemTracker::record_some_operation(addr, bytes); >> } >> } else { >> result = pd_unmap_memory(addr, bytes); >> } >> ``` >> To: >> >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_unmap_memory(addr, bytes); >> MemTracker::record_some_operation(addr, bytes); >> ``` >> This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. >> >> I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section. However, I found that the OS-specific "pd_" functions are already short and to-the-point, so doing this wasn't reducing the lock scope very much. Instead it just makes the code more messy by having to maintain the locking and NMT accounting in each platform specific i... > > Robert Toyonaga has updated the pull request incrementally with two additional commits since the last revision: > > - tests and comments > - Revert "make memory op and NMT accounting atomic" > > This reverts commit 86423d0b7e8e2b0b313a686a64c803028a5f2420. Tonight we tested this PR on AIX and it failed in the gtest with Internal Error (os_aix.cpp:1917), pid=26476938, tid=258 Error: guarantee((vmi)) failed This will happen if a `os::pd_commit_memory()` or `os::pd_release_memory()` or `os::pd_uncommit_memory()` is called on memory not allocated with `os::pd_reserve_memory()` or `os::pd_attempt_map_memory_to_file_at()` or `os::pd_attempt_reserve_memory_at()` ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2775248289 From fyang at openjdk.org Thu Apr 3 10:24:51 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 3 Apr 2025 10:24:51 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v8] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: <7WDJBUhO6sTMzygEH4bazzh9xEpggcmX4PS4cSb9dpc=.870f2995-6324-461a-a1a1-c374b5331ed9@github.com> On Thu, 3 Apr 2025 09:55:26 GMT, Robbin Ehn wrote: > @feilongjiang and @RealFYang what do you think ? Interesting :-) Supposing that they are subject to change (maybe possible further optimizations for specific memory model) in the future, I personally perfer a clean separation of the type types. And I guess it's more likely that people will only look at one of them at a time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2026679105 From mli at openjdk.org Thu Apr 3 10:47:03 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 3 Apr 2025 10:47:03 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v8] In-Reply-To: <7WDJBUhO6sTMzygEH4bazzh9xEpggcmX4PS4cSb9dpc=.870f2995-6324-461a-a1a1-c374b5331ed9@github.com> References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> <7WDJBUhO6sTMzygEH4bazzh9xEpggcmX4PS4cSb9dpc=.870f2995-6324-461a-a1a1-c374b5331ed9@github.com> Message-ID: On Thu, 3 Apr 2025 10:19:22 GMT, Fei Yang wrote: >> @feilongjiang and @RealFYang what do you think ? > >> @feilongjiang and @RealFYang what do you think ? > > Interesting :-) Supposing that they are subject to change (maybe possible further optimizations for specific memory model) in the future, I personally perfer a clean separation of the type types. And I guess it's more likely that people will only look at one of them at a time. Seems we should only consider current code base, and not consider possible optimization in the future? : ) I'm not objecting current change, it's fine to me too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2026728468 From fyang at openjdk.org Thu Apr 3 11:08:05 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 3 Apr 2025 11:08:05 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v8] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> <7WDJBUhO6sTMzygEH4bazzh9xEpggcmX4PS4cSb9dpc=.870f2995-6324-461a-a1a1-c374b5331ed9@github.com> Message-ID: On Thu, 3 Apr 2025 10:44:07 GMT, Hamlin Li wrote: >>> @feilongjiang and @RealFYang what do you think ? >> >> Interesting :-) Supposing that they are subject to change (maybe possible further optimizations for specific memory model) in the future, I personally perfer a clean separation of the type types. And I guess it's more likely that people will only look at one of them at a time. > > Seems we should only consider current code base, and not consider possible optimization in the future? : ) > I'm not objecting current change, it's fine to me too. Another concern from myside is that the RISC-V ISA is still evolving. People are working on Load-Acquire & Store-Release (Check RISC-V Zalasr extension [[1]](https://github.com/riscv/riscv-zalasr)). From my knowledge on aarch64's support for ldar/stlr instructions, I am expecting quite some change in the WMO part when we have such a similar extension. But I don't think that the TSO part will be affected much. So better to decouple the two for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2026759091 From stefank at openjdk.org Thu Apr 3 11:22:12 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 3 Apr 2025 11:22:12 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v8] In-Reply-To: <5b6QutsBRX4KGZY50opOtRztczGFdilIy3CwlisJ4s4=.7b8a3f1a-4f6f-469f-829b-1d693ad1355d@github.com> References: <5b6QutsBRX4KGZY50opOtRztczGFdilIy3CwlisJ4s4=.7b8a3f1a-4f6f-469f-829b-1d693ad1355d@github.com> Message-ID: On Wed, 2 Apr 2025 18:37:36 GMT, Gerard Ziemski wrote: >> This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. >> >> I tried to fill in tag, when I was pretty certain that I had the right type. >> >> At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > The real feedback from StefanK Looks mostly good, there are a few things that I'd like to get cleaned up, but we can take care of that in separate RFEs. src/hotspot/share/runtime/os.hpp line 527: > 525: > 526: static char* map_memory(int fd, const char* file_name, size_t file_offset, > 527: char *addr, size_t bytes, MemTag mem_tag, bool read_only = false, AFAICT, there's no need to have a default value for read_only. I think we should remove this default value and move the MemTag parameter so that it comes after read_only and before allow_exec. This would make the parameter order more consistent with the other functions that accept a mem_tag and an executable. Given that you have tested the current patch, I'm fine with doing this as a small follow-up patch. test/hotspot/gtest/gc/z/test_zForwarding.cpp line 58: > 56: > 57: for (uintptr_t start = 0; start + ZGranuleSize <= ZAddressOffsetMax; start += increment) { > 58: char* const reserved = os::attempt_reserve_memory_at((char*)ZAddressHeapBase + start, ZGranuleSize, mtNone); Suggestion: char* const reserved = os::attempt_reserve_memory_at((char*)ZAddressHeapBase + start, ZGranuleSize, mtTest); This snuck in with one of your latest changes. test/hotspot/gtest/runtime/test_os.cpp line 733: > 731: // Reserve a small range and fill it with a marker string, should show up > 732: // on implementations displaying range snippets > 733: char* p = os::reserve_memory(1 * M, mtInternal); Suggestion: char* p = os::reserve_memory(1 * M, mtTest); ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24282#pullrequestreview-2739478381 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2026779415 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2026781507 PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2026782706 From rehn at openjdk.org Thu Apr 3 11:39:55 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 3 Apr 2025 11:39:55 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v8] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> <7WDJBUhO6sTMzygEH4bazzh9xEpggcmX4PS4cSb9dpc=.870f2995-6324-461a-a1a1-c374b5331ed9@github.com> Message-ID: On Thu, 3 Apr 2025 11:02:23 GMT, Fei Yang wrote: >> Seems we should only consider current code base, and not consider possible optimization in the future? : ) >> I'm not objecting current change, it's fine to me too. > > Another concern from myside is that the RISC-V ISA is still evolving. People are working on Load-Acquire & Store-Release (Check RISC-V Zalasr extension [[1]](https://github.com/riscv/riscv-zalasr)). From my knowledge on aarch64's support for ldar/stlr instructions, I am expecting quite some change in the WMO part when we have such a similar extension. But I don't think that the TSO part will be affected much. So better to decouple the two for now. Ok, I'll ship as is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24035#discussion_r2026816043 From aph at openjdk.org Thu Apr 3 11:54:56 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 3 Apr 2025 11:54:56 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v11] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 10:17:56 GMT, Amit Kumar wrote: >> s390x implementation for Class.isInstance intrinsic. >> >> Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. >> >> Benchmark results will be updated soon. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestion from Lutz OK! ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23535#pullrequestreview-2739577689 From rkennke at openjdk.org Thu Apr 3 12:05:17 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 3 Apr 2025 12:05:17 GMT Subject: RFR: 8353588: [REDO] DaCapo xalan performance with -XX:+UseObjectMonitorTable Message-ID: Like #24098, but clears the BasicLock cache before calling inflate_and_enter(). diff --git a/src/hotspot/share/runtime/deoptimization.cpp b/src/hotspot/share/runtime/deoptimization.cpp index f7e0844639b..f17c46fea38 100644 --- a/src/hotspot/share/runtime/deoptimization.cpp +++ b/src/hotspot/share/runtime/deoptimization.cpp @@ -1667,6 +1667,9 @@ bool Deoptimization::relock_objects(JavaThread* thread, GrowableArraylock_stack().contains(obj())) { + if (UseObjectMonitorTable) { + lock->clear_object_monitor_cache(); + } LightweightSynchronizer::inflate_fast_locked_object(obj(), ObjectSynchronizer::InflateCause::inflate_cause_vm_internal, deoptee_thread, thread); } ------------- Commit messages: - 8353588: [REDO] DaCapo xalan performance with -XX:+UseObjectMonitorTable Changes: https://git.openjdk.org/jdk/pull/24413/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24413&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353588 Stats: 66 lines in 9 files changed: 40 ins; 6 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/24413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24413/head:pull/24413 PR: https://git.openjdk.org/jdk/pull/24413 From jsjolen at openjdk.org Thu Apr 3 12:14:54 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 3 Apr 2025 12:14:54 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances [v2] In-Reply-To: <_m74U7bVGssSQnbrkP-4KvS5nga3bg4Bh4g5BlU07Kw=.f951a8cf-a4a6-452f-936e-284decfd9df8@github.com> References: <_m74U7bVGssSQnbrkP-4KvS5nga3bg4Bh4g5BlU07Kw=.f951a8cf-a4a6-452f-936e-284decfd9df8@github.com> Message-ID: <1QxEpKiI5pmXlFFnDDSEn2mKrQSPqey0o9sgYoKTExQ=.d980f3cc-db54-4232-95da-450338e9c2e8@github.com> On Wed, 2 Apr 2025 09:42:05 GMT, Thomas Stuefe wrote: >> In preparation for planned GC performance improvements (KLUT), I would like to reduce the average number of oop map entries. >> >> For details, please see JBS issue text. >> >> ----------------------- >> >> Patch results: >> >> The patch brings a positive change of oop map size, reducing the likelihood of lengthy oop maps. Here the oop map size distribution over all JDK classes in the JDK image: >> >> Before: >> >> 5395 - non-static oop maps (0 entries) >> 9330 - non-static oop maps (1 entries) >> 1449 - non-static oop maps (2 entries) >> 274 - non-static oop maps (3 entries) >> 218 - non-static oop maps (4 entries) >> 75 - non-static oop maps (5 entries) >> 7 - non-static oop maps (6 entries) >> 4 - non-static oop maps (7 entries) >> >> >> Now: >> >> 5395 - non-static oop maps (0 entries) >> 10178 - non-static oop maps (1 entries) >> 933 - non-static oop maps (2 entries) >> 229 - non-static oop maps (3 entries) >> 16 - non-static oop maps (4 entries) >> 1 - non-static oop maps (5 entries) >> >> >> For example, `java.util.concurrent.ConcurrentHashMap$TreeNode` is changed from having 2 entries to having just one entry, which is nice for a class that may be instantiated a lot: >> >> Before: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000000d1dddc0} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'red' 'Z' @28 << derived class starts here, non-oops lead >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 >> - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 >> - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 >> - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @44 >> - non-static oop maps (2 entries): 16-24 32-44 >> >> Now: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000007e1de450} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @28 << class starts here, oop... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - add regression test > - Reworked to use prior super klass layout reconstruction pass > - Merge branch 'master' into JDK-8353273-Reduce-average-number-of-oop-map-entries-in-instance-objects > - alternate-order > - print Hi, Neat change. I have read up on how field layouts work, and this seems correct to me. Let me run this in our CI for a bit and see what comes out before I approve it, thank you. ------------- PR Review: https://git.openjdk.org/jdk/pull/24330#pullrequestreview-2739627906 From coleenp at openjdk.org Thu Apr 3 12:37:17 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Apr 2025 12:37:17 GMT Subject: RFR: 8353588: [REDO] DaCapo xalan performance with -XX:+UseObjectMonitorTable In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 11:59:58 GMT, Roman Kennke wrote: > Like #24098, but clears the BasicLock cache before calling inflate_and_enter(). > > > diff --git a/src/hotspot/share/runtime/deoptimization.cpp b/src/hotspot/share/runtime/deoptimization.cpp > index f7e0844639b..f17c46fea38 100644 > --- a/src/hotspot/share/runtime/deoptimization.cpp > +++ b/src/hotspot/share/runtime/deoptimization.cpp > @@ -1667,6 +1667,9 @@ bool Deoptimization::relock_objects(JavaThread* thread, GrowableArray // was fast_locked to restore the valid lock stack. > ObjectSynchronizer::enter_for(obj, lock, deoptee_thread); > if (deoptee_thread->lock_stack().contains(obj())) { > + if (UseObjectMonitorTable) { > + lock->clear_object_monitor_cache(); > + } > LightweightSynchronizer::inflate_fast_locked_object(obj(), ObjectSynchronizer::InflateCause::inflate_cause_vm_internal, > deoptee_thread, thread); > } @xmas92 had two choices in the CR but I like this one better because a few lines above it also clears the object_monitor_cache for UseObjectMonitorTable, and I don't like testing null pointers. Changes requested by coleenp (Reviewer). src/hotspot/share/runtime/deoptimization.cpp line 1667: > 1665: // We have lost information about the correct state of the lock stack. > 1666: // Entering may create an invalid lock stack. Inflate the lock if it > 1667: // was fast_locked to restore the valid lock stack. Don't we need to clear the lock before calling enter_for() ? ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24413#pullrequestreview-2739680757 PR Review: https://git.openjdk.org/jdk/pull/24413#pullrequestreview-2739687932 PR Review Comment: https://git.openjdk.org/jdk/pull/24413#discussion_r2026903699 From coleenp at openjdk.org Thu Apr 3 12:40:16 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Apr 2025 12:40:16 GMT Subject: RFR: 8353588: [REDO] DaCapo xalan performance with -XX:+UseObjectMonitorTable In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 11:59:58 GMT, Roman Kennke wrote: > Like #24098, but clears the BasicLock cache before calling inflate_and_enter(). > > > diff --git a/src/hotspot/share/runtime/deoptimization.cpp b/src/hotspot/share/runtime/deoptimization.cpp > index f7e0844639b..f17c46fea38 100644 > --- a/src/hotspot/share/runtime/deoptimization.cpp > +++ b/src/hotspot/share/runtime/deoptimization.cpp > @@ -1667,6 +1667,9 @@ bool Deoptimization::relock_objects(JavaThread* thread, GrowableArray // was fast_locked to restore the valid lock stack. > ObjectSynchronizer::enter_for(obj, lock, deoptee_thread); > if (deoptee_thread->lock_stack().contains(obj())) { > + if (UseObjectMonitorTable) { > + lock->clear_object_monitor_cache(); > + } > LightweightSynchronizer::inflate_fast_locked_object(obj(), ObjectSynchronizer::InflateCause::inflate_cause_vm_internal, > deoptee_thread, thread); > } Also, I'll run this through our tier1-4 testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24413#issuecomment-2775654163 From amitkumar at openjdk.org Thu Apr 3 12:43:26 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 3 Apr 2025 12:43:26 GMT Subject: RFR: 8350182: [s390x] Relativize locals in interpreter frames [v3] In-Reply-To: References: Message-ID: > Port for [JDK-8299795](https://bugs.openjdk.org/browse/JDK-8299795) Relativize Z_locals in interpreter frame for s390x. > > Tier1 test with fastdebug vm are clean. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: use Z_R0 as temp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23660/files - new: https://git.openjdk.org/jdk/pull/23660/files/46d6ae1c..e36c4982 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23660&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23660&range=01-02 Stats: 7 lines in 1 file changed: 0 ins; 4 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23660.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23660/head:pull/23660 PR: https://git.openjdk.org/jdk/pull/23660 From amitkumar at openjdk.org Thu Apr 3 12:43:26 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 3 Apr 2025 12:43:26 GMT Subject: RFR: 8350182: [s390x] Relativize locals in interpreter frames [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 06:20:47 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> use Z_R0 as helper > > src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp line 1145: > >> 1143: __ z_stg(Z_R1, _z_ijava_state_neg(locals), fp); >> 1144: >> 1145: __ z_lgr(Z_R1, Z_R0); // restore R1 > > Why don't you use Z_R0 for the relativation calculations and leave Z_R1 untouched? That will avoid the save/restore overhead. updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23660#discussion_r2026912409 From coleenp at openjdk.org Thu Apr 3 12:46:43 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Apr 2025 12:46:43 GMT Subject: RFR: 8353588: [REDO] DaCapo xalan performance with -XX:+UseObjectMonitorTable [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 12:43:55 GMT, Roman Kennke wrote: >> Like #24098, but clears the BasicLock cache before calling inflate_and_enter(). >> >> >> diff --git a/src/hotspot/share/runtime/deoptimization.cpp b/src/hotspot/share/runtime/deoptimization.cpp >> index f7e0844639b..f17c46fea38 100644 >> --- a/src/hotspot/share/runtime/deoptimization.cpp >> +++ b/src/hotspot/share/runtime/deoptimization.cpp >> @@ -1667,6 +1667,9 @@ bool Deoptimization::relock_objects(JavaThread* thread, GrowableArray> // was fast_locked to restore the valid lock stack. >> ObjectSynchronizer::enter_for(obj, lock, deoptee_thread); >> if (deoptee_thread->lock_stack().contains(obj())) { >> + if (UseObjectMonitorTable) { >> + lock->clear_object_monitor_cache(); >> + } >> LightweightSynchronizer::inflate_fast_locked_object(obj(), ObjectSynchronizer::InflateCause::inflate_cause_vm_internal, >> deoptee_thread, thread); >> } > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Clear cache before enter_for() Yes, I'll repost when testing is finished. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24413#pullrequestreview-2739710224 From rkennke at openjdk.org Thu Apr 3 12:46:42 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 3 Apr 2025 12:46:42 GMT Subject: RFR: 8353588: [REDO] DaCapo xalan performance with -XX:+UseObjectMonitorTable [v2] In-Reply-To: References: Message-ID: > Like #24098, but clears the BasicLock cache before calling inflate_and_enter(). > > > diff --git a/src/hotspot/share/runtime/deoptimization.cpp b/src/hotspot/share/runtime/deoptimization.cpp > index f7e0844639b..f17c46fea38 100644 > --- a/src/hotspot/share/runtime/deoptimization.cpp > +++ b/src/hotspot/share/runtime/deoptimization.cpp > @@ -1667,6 +1667,9 @@ bool Deoptimization::relock_objects(JavaThread* thread, GrowableArray // was fast_locked to restore the valid lock stack. > ObjectSynchronizer::enter_for(obj, lock, deoptee_thread); > if (deoptee_thread->lock_stack().contains(obj())) { > + if (UseObjectMonitorTable) { > + lock->clear_object_monitor_cache(); > + } > LightweightSynchronizer::inflate_fast_locked_object(obj(), ObjectSynchronizer::InflateCause::inflate_cause_vm_internal, > deoptee_thread, thread); > } Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Clear cache before enter_for() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24413/files - new: https://git.openjdk.org/jdk/pull/24413/files/a83a451e..9073a18c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24413&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24413&range=00-01 Stats: 6 lines in 1 file changed: 3 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24413/head:pull/24413 PR: https://git.openjdk.org/jdk/pull/24413 From rkennke at openjdk.org Thu Apr 3 12:46:44 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 3 Apr 2025 12:46:44 GMT Subject: RFR: 8353588: [REDO] DaCapo xalan performance with -XX:+UseObjectMonitorTable [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 12:34:36 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Clear cache before enter_for() > > src/hotspot/share/runtime/deoptimization.cpp line 1667: > >> 1665: // We have lost information about the correct state of the lock stack. >> 1666: // Entering may create an invalid lock stack. Inflate the lock if it >> 1667: // was fast_locked to restore the valid lock stack. > > Don't we need to clear the lock before calling enter_for() ? Oh yes, of course. (I should probably not 'fix it real quick at 5am before heading out to the airport' ;-) ) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24413#discussion_r2026917696 From coleenp at openjdk.org Thu Apr 3 13:13:51 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Apr 2025 13:13:51 GMT Subject: RFR: 8353588: [REDO] DaCapo xalan performance with -XX:+UseObjectMonitorTable [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 12:42:55 GMT, Roman Kennke wrote: >> src/hotspot/share/runtime/deoptimization.cpp line 1667: >> >>> 1665: // We have lost information about the correct state of the lock stack. >>> 1666: // Entering may create an invalid lock stack. Inflate the lock if it >>> 1667: // was fast_locked to restore the valid lock stack. >> >> Don't we need to clear the lock before calling enter_for() ? > > Oh yes, of course. > (I should probably not 'fix it real quick at 5am before heading out to the airport' ;-) ) I guessed that's what you were doing :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24413#discussion_r2026967979 From rrich at openjdk.org Thu Apr 3 13:33:48 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 3 Apr 2025 13:33:48 GMT Subject: RFR: 8353274: [PPC64] Bug related to -XX:+UseCompactObjectHeaders -XX:-UseSIGTRAP in JDK-8305895 In-Reply-To: <6GVJHW26mRaRkydXx2pCmSsjdaKsm4ttjKeQ2vnvHG4=.bbf7fff9-fa27-41ef-a4ac-42daba4b890f@github.com> References: <6GVJHW26mRaRkydXx2pCmSsjdaKsm4ttjKeQ2vnvHG4=.bbf7fff9-fa27-41ef-a4ac-42daba4b890f@github.com> Message-ID: <2g166YAXhScM5zH-ANlPDnETVqj2ncxgzYHhcd5c5zE=.6e2142da-e6f9-4dd5-8706-fa1e6806e70f@github.com> On Mon, 31 Mar 2025 14:25:09 GMT, Martin Doerr wrote: > `MacroAssembler::ic_check` compares the `Klass*` in the compact format (no decode). However, a right shift is needed in case of `UseCompactObjectHeaders` (see `load_narrow_klass_compact`). This was missing in the slower version which doesn't use SIGTRAP. Looks good. Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24331#pullrequestreview-2739859590 From rrich at openjdk.org Thu Apr 3 13:38:49 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 3 Apr 2025 13:38:49 GMT Subject: RFR: 8353274: [PPC64] Bug related to -XX:+UseCompactObjectHeaders -XX:-UseSIGTRAP in JDK-8305895 In-Reply-To: <6GVJHW26mRaRkydXx2pCmSsjdaKsm4ttjKeQ2vnvHG4=.bbf7fff9-fa27-41ef-a4ac-42daba4b890f@github.com> References: <6GVJHW26mRaRkydXx2pCmSsjdaKsm4ttjKeQ2vnvHG4=.bbf7fff9-fa27-41ef-a4ac-42daba4b890f@github.com> Message-ID: On Mon, 31 Mar 2025 14:25:09 GMT, Martin Doerr wrote: > `MacroAssembler::ic_check` compares the `Klass*` in the compact format (no decode). However, a right shift is needed in case of `UseCompactObjectHeaders` (see `load_narrow_klass_compact`). This was missing in the slower version which doesn't use SIGTRAP. I've been looking for more references to `oopDesc::klass_offset_in_bytes()` and found https://github.com/openjdk/jdk/blob/296d9d6f7a734cc2bab21c58f21a941150b4cf2a/src/hotspot/cpu/ppc/c2_MacroAssembler_ppc.cpp#L54 To me it looks like neither on aarch64 nor on x86 `oopDesc::klass_offset_in_bytes()` needs to be deducted from the address. Why is it done on ppc64? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24331#issuecomment-2775818837 From sroy at openjdk.org Thu Apr 3 13:43:02 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Thu, 3 Apr 2025 13:43:02 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> <7rVbCbWDqrib9Jyj7_hkD-r9rkaAOIXuwOGAqImrxoY=.a55e9572-b4e6-4cc2-aa0e-c23deb9961ce@github.com> <1wMuCBIwYPaPM-bbsnFHi8hnkq-IL5Q_kCmaa1AdDpM=.1240fd83-db6d-489a-bbb3-48891daac064@github.com> <0DSwCsm5yp2be9s-cgkZP4HCo4ppGD_SkDq4KyjfMEw=.0d74a4c8-e155-4186-884f-2575924f9d03@github.com> Message-ID: On Wed, 19 Mar 2025 08:59:32 GMT, Suchismith Roy wrote: >> @TheRealMDoerr >> https://www.researchgate.net/publication/285612706_Implementing_GCM_on_ARMv8 >> >> I think the same algorithm used for polynomial reduction -Section 4.3 > > Hi @theRealAph Do you see a scope to reduce these swaps in the algorithm , for the above mentioned instructions. > I feel there is a similar set of instructions used to perform reduction in > https://www.researchgate.net/publication/285612706_Implementing_GCM_on_ARMv8 Hi @theRealAph Let me know if you need any additional context or if there?s anything I can do to help with the review. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r2027025476 From amitkumar at openjdk.org Thu Apr 3 14:51:09 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 3 Apr 2025 14:51:09 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v11] In-Reply-To: References: Message-ID: <7TTSM_-CodiX2Cqo420HxZTKreSZ4KKNQX_dapXQHa8=.5748656f-4123-4782-b45d-54c2939753e8@github.com> On Wed, 19 Mar 2025 10:17:56 GMT, Amit Kumar wrote: >> s390x implementation for Class.isInstance intrinsic. >> >> Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. >> >> Benchmark results will be updated soon. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestion from Lutz Thanks for the approval Lutz, Andrew :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23535#issuecomment-2776037052 From amitkumar at openjdk.org Thu Apr 3 14:51:09 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 3 Apr 2025 14:51:09 GMT Subject: Integrated: 8349686: [s390x] C1: Improve Class.isInstance intrinsic In-Reply-To: References: Message-ID: <7wzU0kY7nN1CpvzIOSeCgtKIMhkpZmBUQmzFWDwcjvo=.335ccab8-017f-477f-a8d0-4c41abd3c878@github.com> On Mon, 10 Feb 2025 02:29:03 GMT, Amit Kumar wrote: > s390x implementation for Class.isInstance intrinsic. > > Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. > > Benchmark results will be updated soon. This pull request has now been integrated. Changeset: b428cda3 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/b428cda3c6a445ffa638c6f4e86225d86a1876d4 Stats: 107 lines in 4 files changed: 96 ins; 3 del; 8 mod 8349686: [s390x] C1: Improve Class.isInstance intrinsic Reviewed-by: lucy, aph ------------- PR: https://git.openjdk.org/jdk/pull/23535 From mdoerr at openjdk.org Thu Apr 3 14:57:22 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 3 Apr 2025 14:57:22 GMT Subject: RFR: 8353274: [PPC64] Bug related to -XX:+UseCompactObjectHeaders -XX:-UseSIGTRAP in JDK-8305895 In-Reply-To: References: <6GVJHW26mRaRkydXx2pCmSsjdaKsm4ttjKeQ2vnvHG4=.bbf7fff9-fa27-41ef-a4ac-42daba4b890f@github.com> Message-ID: On Thu, 3 Apr 2025 13:36:26 GMT, Richard Reingruber wrote: > I've been looking for more references to `oopDesc::klass_offset_in_bytes()` and found > > https://github.com/openjdk/jdk/blob/296d9d6f7a734cc2bab21c58f21a941150b4cf2a/src/hotspot/cpu/ppc/c2_MacroAssembler_ppc.cpp#L54 > > > To me it looks like neither on aarch64 nor on x86 `oopDesc::klass_offset_in_bytes()` needs to be deducted from the address. Why is it done on ppc64? The other platforms had the same implementation in an earlier version. It was simplified in a way which doesn't work on Big Endian platforms. That's why PPC64 still has the old implementation. See https://github.com/openjdk/jdk/pull/22078#issuecomment-2479943053 Thanks for the review! I think this PR is simple enough to go in with only 1 review. It only touches PPC64 code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24331#issuecomment-2776067206 From duke at openjdk.org Thu Apr 3 15:27:15 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Thu, 3 Apr 2025 15:27:15 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock [v3] In-Reply-To: References: Message-ID: > ### Summary: > This PR makes memory operations atomic with NMT accounting. > > ### The problem: > In memory related functions like `os::commit_memory` and `os::reserve_memory` the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in `MemTracker`, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races. > > 1.1 Thread_1 releases range_A. > 1.2 Thread_1 tells NMT "range_A has been released". > > 2.1 Thread_2 reserves (the now free) range_A. > 2.2 Thread_2 tells NMT "range_A is reserved". > > Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS. > > ### Solution: > Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself. > > ### Other notes: > I also simplified this pattern found in many places: > > if (MemTracker::enabled()) { > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_some_operation(addr, bytes); > if (result != nullptr) { > MemTracker::record_some_operation(addr, bytes); > } > } else { > result = pd_unmap_memory(addr, bytes); > } > ``` > To: > > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_unmap_memory(addr, bytes); > MemTracker::record_some_operation(addr, bytes); > ``` > This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. > > I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section. However, I found that the OS-specific "pd_" functions are already short and to-the-point, so doing this wasn't reducing the lock scope very much. Instead it just makes the code more messy by having to maintain the locking and NMT accounting in each platform specific implementation. > > In many places I've done minor refactoring by relocating call... Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision: exclude file mapping tests on AIX. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24084/files - new: https://git.openjdk.org/jdk/pull/24084/files/74f31202..5c23a76a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24084&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24084&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24084.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24084/head:pull/24084 PR: https://git.openjdk.org/jdk/pull/24084 From duke at openjdk.org Thu Apr 3 15:27:16 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Thu, 3 Apr 2025 15:27:16 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 10:21:29 GMT, Joachim Kern wrote: > Internal Error (os_aix.cpp:1917), pid=26476938, tid=258 Error: guarantee((vmi)) failed > > This will happen if a `os::pd_commit_memory()` or `os::pd_release_memory()` or `os::pd_uncommit_memory()` is called on memory not allocated with `os::pd_reserve_memory()` or `os::pd_attempt_map_memory_to_file_at()` or `os::pd_attempt_reserve_memory_at()` Thank you for running the tests on AIX. I've excluded the file mapping tests that don't meet that criteria on AIX. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2776162180 From mbaesken at openjdk.org Thu Apr 3 15:47:27 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 3 Apr 2025 15:47:27 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v3] In-Reply-To: References: Message-ID: > The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. > SOURCE=".:git:21af8c7e7405" > Also the MODULES list is probably useful to have. > Add this info (or the complete content of the release file) to the hs_err files. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: use one time PeriodicTask ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24244/files - new: https://git.openjdk.org/jdk/pull/24244/files/bd95acf9..c33e11b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24244&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24244&range=01-02 Stats: 19 lines in 2 files changed: 16 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24244/head:pull/24244 PR: https://git.openjdk.org/jdk/pull/24244 From coleenp at openjdk.org Thu Apr 3 15:49:08 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Apr 2025 15:49:08 GMT Subject: RFR: 8353588: [REDO] DaCapo xalan performance with -XX:+UseObjectMonitorTable [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 12:46:42 GMT, Roman Kennke wrote: >> Like #24098, but clears the BasicLock cache before calling inflate_and_enter(). >> >> >> diff --git a/src/hotspot/share/runtime/deoptimization.cpp b/src/hotspot/share/runtime/deoptimization.cpp >> index f7e0844639b..f17c46fea38 100644 >> --- a/src/hotspot/share/runtime/deoptimization.cpp >> +++ b/src/hotspot/share/runtime/deoptimization.cpp >> @@ -1667,6 +1667,9 @@ bool Deoptimization::relock_objects(JavaThread* thread, GrowableArray> // was fast_locked to restore the valid lock stack. >> ObjectSynchronizer::enter_for(obj, lock, deoptee_thread); >> if (deoptee_thread->lock_stack().contains(obj())) { >> + if (UseObjectMonitorTable) { >> + lock->clear_object_monitor_cache(); >> + } >> LightweightSynchronizer::inflate_fast_locked_object(obj(), ObjectSynchronizer::InflateCause::inflate_cause_vm_internal, >> deoptee_thread, thread); >> } > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Clear cache before enter_for() Our internal tier1-4 tests passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24413#issuecomment-2776233828 From aboldtch at openjdk.org Thu Apr 3 16:07:13 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 3 Apr 2025 16:07:13 GMT Subject: RFR: 8353588: [REDO] DaCapo xalan performance with -XX:+UseObjectMonitorTable [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 12:46:42 GMT, Roman Kennke wrote: >> Like #24098, but clears the BasicLock cache before calling inflate_and_enter(). >> >> >> diff --git a/src/hotspot/share/runtime/deoptimization.cpp b/src/hotspot/share/runtime/deoptimization.cpp >> index f7e0844639b..f17c46fea38 100644 >> --- a/src/hotspot/share/runtime/deoptimization.cpp >> +++ b/src/hotspot/share/runtime/deoptimization.cpp >> @@ -1667,6 +1667,9 @@ bool Deoptimization::relock_objects(JavaThread* thread, GrowableArray> // was fast_locked to restore the valid lock stack. >> ObjectSynchronizer::enter_for(obj, lock, deoptee_thread); >> if (deoptee_thread->lock_stack().contains(obj())) { >> + if (UseObjectMonitorTable) { >> + lock->clear_object_monitor_cache(); >> + } >> LightweightSynchronizer::inflate_fast_locked_object(obj(), ObjectSynchronizer::InflateCause::inflate_cause_vm_internal, >> deoptee_thread, thread); >> } > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Clear cache before enter_for() Alright. I'll hope we can improve this mechanism to be less fragile and more obvious in the future. The stack cache was always written to be for C2 and the CacheSetter was there to cover all the other places we construct these BasicLocks in case they appear in a C2 frame. Now we are using it bidirectionally in both enter and exit. But this should work, and I know you are working on future improvements here @coleenp Thanks for the fix @rkennke ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24413#pullrequestreview-2740439741 From jsjolen at openjdk.org Thu Apr 3 16:12:58 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 3 Apr 2025 16:12:58 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances [v2] In-Reply-To: <_m74U7bVGssSQnbrkP-4KvS5nga3bg4Bh4g5BlU07Kw=.f951a8cf-a4a6-452f-936e-284decfd9df8@github.com> References: <_m74U7bVGssSQnbrkP-4KvS5nga3bg4Bh4g5BlU07Kw=.f951a8cf-a4a6-452f-936e-284decfd9df8@github.com> Message-ID: <8yzNsDXgCQX14OmxmjPw53xDjy5tv2FJeuXqtvI6dR8=.794575f4-48d7-4a69-9e6e-0581980a8198@github.com> On Wed, 2 Apr 2025 09:42:05 GMT, Thomas Stuefe wrote: >> In preparation for planned GC performance improvements (KLUT), I would like to reduce the average number of oop map entries. >> >> For details, please see JBS issue text. >> >> ----------------------- >> >> Patch results: >> >> The patch brings a positive change of oop map size, reducing the likelihood of lengthy oop maps. Here the oop map size distribution over all JDK classes in the JDK image: >> >> Before: >> >> 5395 - non-static oop maps (0 entries) >> 9330 - non-static oop maps (1 entries) >> 1449 - non-static oop maps (2 entries) >> 274 - non-static oop maps (3 entries) >> 218 - non-static oop maps (4 entries) >> 75 - non-static oop maps (5 entries) >> 7 - non-static oop maps (6 entries) >> 4 - non-static oop maps (7 entries) >> >> >> Now: >> >> 5395 - non-static oop maps (0 entries) >> 10178 - non-static oop maps (1 entries) >> 933 - non-static oop maps (2 entries) >> 229 - non-static oop maps (3 entries) >> 16 - non-static oop maps (4 entries) >> 1 - non-static oop maps (5 entries) >> >> >> For example, `java.util.concurrent.ConcurrentHashMap$TreeNode` is changed from having 2 entries to having just one entry, which is nice for a class that may be instantiated a lot: >> >> Before: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000000d1dddc0} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'red' 'Z' @28 << derived class starts here, non-oops lead >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 >> - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 >> - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 >> - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @44 >> - non-static oop maps (2 entries): 16-24 32-44 >> >> Now: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000007e1de450} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @28 << class starts here, oop... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - add regression test > - Reworked to use prior super klass layout reconstruction pass > - Merge branch 'master' into JDK-8353273-Reduce-average-number-of-oop-map-entries-in-instance-objects > - alternate-order > - print Passes tests, approved. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24330#pullrequestreview-2740455686 From stuefe at openjdk.org Thu Apr 3 16:15:00 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 3 Apr 2025 16:15:00 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 15:47:27 GMT, Matthias Baesken wrote: >> The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. >> SOURCE=".:git:21af8c7e7405" >> Also the MODULES list is probably useful to have. >> Add this info (or the complete content of the release file) to the hs_err files. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use one time PeriodicTask Coming to this discussion late. IMHO this is overengineered for just a printout to the hs-err file during error dumping. We already read from proc fs. Proc can be worse (depending on what you read, a lot) than reading sequentially from a flat file. Remember that we already run the JVM binaries from the same file system. We read debug information from those binaries during error dumping, and that causes a ton of IO; a sequential read of a tiny file is a drop in the bucket. Also remember that we have safety fuses: Step timeouts and Step signal handling - so if this read ever turns out to be a problem, e.g by hanging, the Step would be cancelled and error reporting would continue with the next step. I would, however, attempt to avoid malloc. Not super important, but if its easy to do I would do it. Best by using a small fixed-sized stack-allocated buffer, and just printing the file line by line. Just my 5 cent. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2776304991 From jnimeh at openjdk.org Thu Apr 3 16:45:25 2025 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Thu, 3 Apr 2025 16:45:25 GMT Subject: RFR: 8350126: Regression ~3% on Crypto-ChaCha20Poly1305.encrypt for MacOSX aarch64 Message-ID: This fix addresses a performance regression found on some aarch64 processors, namely the Apple M1, when we moved to a quarter round parallel implementation in JDK-8349106. After making some improvements in the ordering of the instructions in the 20-round loop we found that going back to a block-parallel implementation was faster, but it definitely needed the ordering changes for that to be the case. More importantly, the block parallel implementation with the interleaving turns out to be faster on even those processors that showed improvements when moving to the quarter round parallel implementation. There is a spreadsheet attached to the JBS bug that shows 3 different implementations relative to the current (QR-parallel with no interleaving) implementation on 3 different ARM64 processors. Comparative benchmarks can also be found below. ------------- Commit messages: - 8350126: Regression ~3% on Crypto-ChaCha20Poly1305.encrypt for MacOSX aarch64 Changes: https://git.openjdk.org/jdk/pull/24420/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24420&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350126 Stats: 488 lines in 3 files changed: 238 ins; 214 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/24420.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24420/head:pull/24420 PR: https://git.openjdk.org/jdk/pull/24420 From jnimeh at openjdk.org Thu Apr 3 16:45:25 2025 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Thu, 3 Apr 2025 16:45:25 GMT Subject: RFR: 8350126: Regression ~3% on Crypto-ChaCha20Poly1305.encrypt for MacOSX aarch64 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:31:39 GMT, Jamil Nimeh wrote: > This fix addresses a performance regression found on some aarch64 processors, namely the Apple M1, when we moved to a quarter round parallel implementation in JDK-8349106. After making some improvements in the ordering of the instructions in the 20-round loop we found that going back to a block-parallel implementation was faster, but it definitely needed the ordering changes for that to be the case. More importantly, the block parallel implementation with the interleaving turns out to be faster on even those processors that showed improvements when moving to the quarter round parallel implementation. > > There is a spreadsheet attached to the JBS bug that shows 3 different implementations relative to the current (QR-parallel with no interleaving) implementation on 3 different ARM64 processors. Comparative benchmarks can also be found below. Benchmarks for Apple M1: MacOS Sonoma 14.5, 8x Apple M1 Quarter Round Parallel, No Interleaving --------------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 3837175.980 ? 14108.076 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1150065.857 ? 2238.499 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 299444.203 ? 1914.377 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 76149.432 ? 81.343 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 3457825.749 ? 95284.525 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1100458.180 ? 9856.390 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 296393.225 ? 1176.583 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 75271.693 ? 848.788 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 995936.643 ? 8252.270 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 518474.192 ? 2541.371 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 178582.085 ? 337.094 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 50037.769 ? 60.497 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1189366.955 ? 3437.169 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 568044.693 ? 6057.314 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 181517.405 ? 248.283 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 49339.073 ? 298.549 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 50024.452 ? 53.838 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 49459.758 ? 63.090 ops/s Quarter Round Parallel, With Interleaving ----------------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 3880433.294 ? 9904.562 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1157285.625 ? 2415.082 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 301986.767 ? 339.147 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 75990.670 ? 194.671 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 3486874.086 ? 93507.311 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1111966.942 ? 9602.005 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 297633.816 ? 1455.184 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 74817.230 ? 1737.888 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 998384.311 ? 7491.076 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 517031.021 ? 1756.181 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 179139.212 ? 401.008 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 49796.519 ? 609.335 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1207581.459 ? 13757.759 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 576596.806 ? 4205.682 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 184108.182 ? 229.014 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 50120.498 ? 300.391 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 50053.528 ? 181.415 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 50232.767 ? 62.234 ops/s Block Parallel, No Interleaving ------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 4107524.407 ? 9337.726 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1210532.736 ? 1111.846 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 315178.899 ? 375.858 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 78782.555 ? 856.939 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 3601509.841 ? 103375.315 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1156918.875 ? 9666.447 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 312270.458 ? 1726.717 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 79394.369 ? 513.291 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1029546.842 ? 2317.072 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 532504.493 ? 2836.934 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 183874.028 ? 332.438 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51739.678 ? 122.138 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1263370.572 ? 15424.473 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 588853.049 ? 3419.509 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 188899.111 ? 160.103 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51516.978 ? 147.720 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51758.247 ? 39.852 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51441.519 ? 278.059 ops/s Block Parallel, With Interleaving --------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 4154482.236 ? 8208.082 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1221710.558 ? 5967.515 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 319918.165 ? 327.235 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 80602.283 ? 193.687 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 3710733.896 ? 88631.462 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 1168824.003 ? 10465.340 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 315040.718 ? 1389.500 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 80365.126 ? 586.286 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1007279.441 ? 8794.990 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 536758.995 ? 3346.320 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 184600.058 ? 362.456 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 52079.247 ? 38.558 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 1233639.918 ? 7503.063 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 593298.939 ? 3886.323 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 190535.858 ? 215.443 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51953.765 ? 226.078 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 52073.085 ? 46.961 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 51815.757 ? 331.563 ops/s Benchmarks for Neoverse-N1: System: 2x Neoverse-N1, 2 cores, 1 socket, 1 thread/core (var 0x3, part, 0xD0C) Quarter-Round Parallel Intrinsics Implementation ------------------------------------------------ Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 2219198.137 ? 13314.344 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 684200.661 ? 3601.031 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 181048.566 ? 942.201 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 46150.219 ? 118.031 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 2049320.671 ? 9549.691 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 663456.090 ? 2722.964 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 179921.834 ? 573.613 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 45885.159 ? 102.974 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 476694.433 ? 4118.055 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 251749.129 ? 1535.415 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 87052.901 ? 436.111 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24099.749 ? 136.009 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 601333.942 ? 5414.186 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 280884.583 ? 2332.119 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 90250.320 ? 604.948 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24346.217 ? 101.557 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 23950.145 ? 119.081 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24405.675 ? 93.554 ops/s Quarter-Round Parallel Intrinsics with Interleaving Implementation: ------------------------------------------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 2344673.121 ? 14885.986 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 724626.059 ? 3078.617 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 192723.841 ? 744.860 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 49050.992 ? 118.087 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 2136919.832 ? 7229.740 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 703672.009 ? 2520.798 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 191748.973 ? 421.704 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 48939.791 ? 194.749 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 497137.864 ? 2915.527 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 262127.552 ? 1302.946 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 90018.698 ? 425.574 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24987.421 ? 119.936 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 634980.497 ? 4191.567 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 293529.897 ? 1496.703 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 93230.690 ? 480.282 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24936.479 ? 112.139 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24897.542 ? 76.891 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 25128.075 ? 120.033 ops/s Block-Parallel Intrinsics Implementation ---------------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 2164945.312 ? 8845.473 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 659831.098 ? 1968.217 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 175252.222 ? 512.910 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 44329.489 ? 126.564 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 1975016.045 ? 11695.931 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 640856.881 ? 1830.533 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 173305.072 ? 366.240 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 44208.373 ? 107.018 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 466351.469 ? 3278.807 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 247662.489 ? 1165.507 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 85367.721 ? 404.796 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 23492.360 ? 92.043 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 589645.973 ? 4262.663 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 278130.465 ? 1394.179 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 88081.739 ? 443.476 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 23853.430 ? 104.346 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 23620.475 ? 75.932 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 23750.134 ? 118.572 ops/s Block-Parallel with Interleaving Intrinsics Implementation ---------------------------------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 2358246.820 ? 14256.312 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 734318.183 ? 2447.434 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 196243.937 ? 517.431 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 50008.245 ? 85.350 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 2156054.908 ? 5432.249 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 713847.200 ? 1962.784 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 194383.466 ? 464.389 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 49652.092 ? 166.716 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 497410.798 ? 3632.927 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 261587.126 ? 1336.591 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 90453.673 ? 429.630 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 24963.118 ? 103.795 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 623876.407 ? 4655.637 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 292279.929 ? 1345.033 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 93352.350 ? 429.286 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 25190.232 ? 121.961 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 25128.018 ? 84.863 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 25371.698 ? 129.837 ops/s Benchmarks for Cortex-A72: 4 processor Cortex-A72, 1 cluster, 4 cores/cluster, 1 thread/core (var 0x0, part 0xD08) Quarter Round Parallel Implementation, No Interleaving ------------------------------------------------------ Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 602983.483 ? 6556.879 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 186189.843 ? 628.835 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 49499.230 ? 139.811 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 12487.617 ? 69.484 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 592209.356 ? 3927.984 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 185091.856 ? 366.779 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 49491.296 ? 117.179 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 12512.907 ? 71.587 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 96212.313 ? 2482.928 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 65131.604 ? 1504.555 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 27746.783 ? 229.856 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8381.946 ? 32.122 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 129453.321 ? 3224.106 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 77091.625 ? 1470.684 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 29334.590 ? 303.107 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8460.356 ? 8.524 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8386.624 ? 34.163 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8471.573 ? 8.635 ops/s Quarter Round Parallel Implementaion, With Interleaving ------------------------------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 767143.826 ? 9195.715 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 254386.139 ? 1378.080 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 69152.606 ? 176.940 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 17609.457 ? 71.086 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 746643.194 ? 9077.375 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 251953.223 ? 959.588 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 69064.757 ? 197.231 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 17563.052 ? 97.678 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 105520.550 ? 2805.637 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 72902.046 ? 1738.503 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 33446.843 ? 377.742 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 10437.913 ? 31.702 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 141153.205 ? 3693.280 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 89657.996 ? 1635.631 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 35926.981 ? 244.574 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 10555.879 ? 18.698 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 10440.037 ? 33.023 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 10542.745 ? 45.282 ops/s Block Parallel Implementation, No Interleaving ---------------------------------------------- Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 587100.753 ? 5754.708 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 178737.840 ? 730.445 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 47340.182 ? 121.627 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 11947.269 ? 66.887 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 574123.343 ? 3838.477 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 177870.311 ? 420.125 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 47409.796 ? 109.224 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 11967.672 ? 65.803 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 95867.086 ? 2228.000 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 63376.433 ? 1301.826 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 26988.391 ? 231.289 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8139.090 ? 20.871 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 127770.261 ? 3262.540 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 76019.408 ? 1226.583 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 28652.283 ? 214.896 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8208.186 ? 11.455 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8131.508 ? 27.548 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 8207.550 ? 13.086 ops/s Block Parallel Implementation, With Interleaving ------------------------------------------------ Benchmark (dataSize) (keyLength) (mode) (padding) (permutation) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 256 256 None NoPadding ChaCha20 thrpt 40 826086.130 ? 9933.137 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 1024 256 None NoPadding ChaCha20 thrpt 40 276583.128 ? 1434.611 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 4096 256 None NoPadding ChaCha20 thrpt 40 75688.367 ? 228.277 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.decrypt 16384 256 None NoPadding ChaCha20 thrpt 40 19348.013 ? 77.810 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 256 256 None NoPadding ChaCha20 thrpt 40 800978.386 ? 10445.822 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 1024 256 None NoPadding ChaCha20 thrpt 40 274107.264 ? 1606.978 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 4096 256 None NoPadding ChaCha20 thrpt 40 75446.852 ? 209.379 ops/s o.o.b.j.c.full.CipherBench.ChaCha20.encrypt 16384 256 None NoPadding ChaCha20 thrpt 40 19270.292 ? 105.573 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 105988.778 ? 3001.220 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 76162.169 ? 1692.042 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 34978.996 ? 468.786 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 11040.040 ? 31.844 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 256 256 None NoPadding ChaCha20-Poly1305 thrpt 40 146046.188 ? 3471.952 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 1024 256 None NoPadding ChaCha20-Poly1305 thrpt 40 94041.417 ? 1834.558 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 4096 256 None NoPadding ChaCha20-Poly1305 thrpt 40 37770.658 ? 311.519 ops/s o.o.b.j.c.full.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 11183.053 ? 11.204 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.decrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 11037.956 ? 39.522 ops/s o.o.b.j.c.small.CipherBench.ChaCha20Poly1305.encrypt 16384 256 None NoPadding ChaCha20-Poly1305 thrpt 40 11196.095 ? 33.796 ops/s ------------- PR Comment: https://git.openjdk.org/jdk/pull/24420#issuecomment-2776357177 PR Comment: https://git.openjdk.org/jdk/pull/24420#issuecomment-2776369079 PR Comment: https://git.openjdk.org/jdk/pull/24420#issuecomment-2776371619 From mli at openjdk.org Thu Apr 3 17:06:55 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 3 Apr 2025 17:06:55 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v10] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Thu, 3 Apr 2025 06:36:10 GMT, Robbin Ehn wrote: >> Hi please consider. >> >> |RVWMO| Patched| >> | ---------- | ---------- | >> |fence iorw,iorw| fence iorw,ow| >> |sw t4,120(t2) | sw t4,120(t2) | >> |fence ow,ir | unnecessary_membar_volatile_rvwmo | >> | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | >> |fence iorw,ow | fence iorw,ow| >> |sw t5,124(t2) |sw t5,124(t2) | >> >> |TSO | Patched| >> | ---------- | ---------- | >> | lw a4,120(t2) | lw a6,120(t2) | >> | sw a0,124(t2) | sw t6,124(t2) | >> | fence iorw,iorw | unnecessary_membar_volatile_tso | >> | sw t4,120(t2) | sw t4,120(t2) | >> | fence ow,ir | unnecessary_membar_volatile_tso | >> | sw t6,128(t2) | sw t5,128(t2) | >> | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | >> | fence iorw,iorw | unnecessary_membar_volatile_tso | >> |... | ... | >> | sw a3,120(t2) | sw a0,120(t2) | >> | fence ow,ir | fence ow,ir | >> | lw a7,124(t2) | lw a5,124(t2) | >> >> For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. >> >> The patch do: >> - Separate ztso and rvwmo in ad by using UseZtso predicate. >> - Match all that requires the same membar. >> - Make fence/fencei protected as they shouldn't be using directly. >> - Increased cost of membars to VOLATILE_REF_COST. >> - Added a real_empty pipe. >> - Change to pipe_slow on TSO (as x86). >> >> Note that C2-rv64 is now superior to gcc/clang regrading fencing: >> https://godbolt.org/z/6E3YTP15j >> >> Testing jcstress, tier1 and manually reading the generated assembly. >> Doing additional testing, but RFR it now as it may need some consideration. >> >> /Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Merge branch 'master' into tso-merge > - Merge branch 'master' into tso-merge > - Merge branch 'master' into tso-merge > - Merge branch 'master' into tso-merge > - format comment > - Merge branch 'master' into tso-merge > - Review comments > - Merge branch 'master' into tso-merge > - Review comments > - Fixed ws > - ... and 3 more: https://git.openjdk.org/jdk/compare/3ff35d27...2044cf5f Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24035#pullrequestreview-2740606233 From rkennke at openjdk.org Thu Apr 3 17:14:59 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 3 Apr 2025 17:14:59 GMT Subject: RFR: 8353588: [REDO] DaCapo xalan performance with -XX:+UseObjectMonitorTable [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 15:44:52 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Clear cache before enter_for() > > Our internal tier1-4 tests passed. Thanks @coleenp and @xmas92! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24413#issuecomment-2776447176 From rkennke at openjdk.org Thu Apr 3 17:15:00 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 3 Apr 2025 17:15:00 GMT Subject: Integrated: 8353588: [REDO] DaCapo xalan performance with -XX:+UseObjectMonitorTable In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 11:59:58 GMT, Roman Kennke wrote: > Like #24098, but clears the BasicLock cache before calling inflate_and_enter(). > > > diff --git a/src/hotspot/share/runtime/deoptimization.cpp b/src/hotspot/share/runtime/deoptimization.cpp > index f7e0844639b..f17c46fea38 100644 > --- a/src/hotspot/share/runtime/deoptimization.cpp > +++ b/src/hotspot/share/runtime/deoptimization.cpp > @@ -1667,6 +1667,9 @@ bool Deoptimization::relock_objects(JavaThread* thread, GrowableArray // was fast_locked to restore the valid lock stack. > ObjectSynchronizer::enter_for(obj, lock, deoptee_thread); > if (deoptee_thread->lock_stack().contains(obj())) { > + if (UseObjectMonitorTable) { > + lock->clear_object_monitor_cache(); > + } > LightweightSynchronizer::inflate_fast_locked_object(obj(), ObjectSynchronizer::InflateCause::inflate_cause_vm_internal, > deoptee_thread, thread); > } This pull request has now been integrated. Changeset: d894b781 Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/d894b781b8f245ce8a5d28401c0abb5abb420bc8 Stats: 66 lines in 9 files changed: 40 ins; 6 del; 20 mod 8353588: [REDO] DaCapo xalan performance with -XX:+UseObjectMonitorTable Reviewed-by: coleenp, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/24413 From vpaprotski at openjdk.org Thu Apr 3 18:52:12 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Thu, 3 Apr 2025 18:52:12 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8] In-Reply-To: <5WNrv1s7Bp7hLwSVGqoPw9ycCSHK0Zyka65DpAjnB2s=.31243a29-4fbb-4c21-b671-45470d043335@github.com> References: <5WNrv1s7Bp7hLwSVGqoPw9ycCSHK0Zyka65DpAjnB2s=.31243a29-4fbb-4c21-b671-45470d043335@github.com> Message-ID: On Mon, 31 Mar 2025 19:57:59 GMT, Sean Mullan wrote: > > > I think it would also be useful to write a release note describing the approximate performance improvement gains for the crypto algorithms as displayed in your chart. Thanks. > > > > > > @seanjmullan I think I only done that once, cant find the 'instructions'.. I think Jamil had helped me, but.. (https://bugs.openjdk.org/browse/JDK-8297970) "Create subtask with 'release-note' label?" > > See the [Release Notes section](https://openjdk.org/guide/#release-notes) of the OpenJDK Developer's Guide for the process. > > For this release note, something similar to https://jdk.java.net/24/release-notes#JDK-8333867 would be nice, a couple of sentences explaining the approximate improvement, on what architectures, and for what APIs and algorithms one would see the improvement. Done I think: https://bugs.openjdk.org/browse/JDK-8297970 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23719#issuecomment-2776649392 From vpaprotski at openjdk.org Thu Apr 3 18:52:12 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Thu, 3 Apr 2025 18:52:12 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8] In-Reply-To: <9TJyGXccPFnDI60b2Wg3ZIuQH2nd6LC-pFgEs6p8x1c=.6308a314-dd48-4cb3-9986-8e6eb754d4c2@github.com> References: <9TJyGXccPFnDI60b2Wg3ZIuQH2nd6LC-pFgEs6p8x1c=.6308a314-dd48-4cb3-9986-8e6eb754d4c2@github.com> Message-ID: On Fri, 28 Mar 2025 20:10:42 GMT, Volodymyr Paprotski wrote: >> src/java.base/share/classes/sun/security/util/math/intpoly/MontgomeryIntegerPolynomialP256.java line 164: >> >>> 162: protected void mult(long[] a, long[] b, long[] r) { >>> 163: multImpl(a, b, r); >>> 164: reducePositive(r); >> >> `reducePositive` is now seems unused > > oh.. hmm.. I had a second PR that I decided wasnt worth it that was going to reuse this code.. > > Will create a second JBS and remove https://github.com/openjdk/jdk/pull/24423 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r2027562079 From vlivanov at openjdk.org Thu Apr 3 20:02:55 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 3 Apr 2025 20:02:55 GMT Subject: RFR: 8353572: x86: AMD platforms miss the check for CLWB feature flag In-Reply-To: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> References: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> Message-ID: On Wed, 2 Apr 2025 17:49:30 GMT, Aleksey Shipilev wrote: > Noticed this when doing [JDK-8353558](https://bugs.openjdk.org/browse/JDK-8353558). We only check for CLWB feature flag for Intel platforms. But AMD APM (Rev. 3.36?March 2024) tells me there is a CLWB flag in CPUID Fn0000_0007_EBX_x0 leaf as well. It is in the same place as the flag for Intel. > > Additional testing: > - [x] Ad-hoc tests on Ryzen 5950X src/hotspot/cpu/x86/vm_version_x86.cpp line 3100: > 3098: if (ext_cpuid1_ecx.bits.sse4a != 0) > 3099: result |= CPU_SSE4A; > 3100: if (sef_cpuid7_ebx.bits.clwb != 0) I'm curious what's the rule here when it comes to vendor-specific features? >From what I'm seeing in the sources, both AMD and ZX enumerate only `ext_cpuid1` features while for Intel it's a mix of `sef_cpuid7` and `ext_cpuid1`. So, I'm curious whether the code should be moved up and shared for all CPUs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24385#discussion_r2027652958 From lmesnik at openjdk.org Thu Apr 3 20:04:50 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 3 Apr 2025 20:04:50 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising In-Reply-To: References: Message-ID: <2tf83vKKccOEHQ8lw6nZ0D8MHX0lzIAu8ZhE0IRVQIM=.630949fd-e71e-4179-a369-87568d10e36f@github.com> On Tue, 1 Apr 2025 10:59:16 GMT, Kevin Walls wrote: > We should be consistent, and run all OnError items in a new shell. Currently the ; separator causes a new shell, but multiple -XX:OnError= options are grouped into the same shell. > > next_OnError_command() decides on where a new command starts. It should recognise newlines, and all commands will get their own shell. Can you please add new regression test or update runtime/ErrorHandling/TestOnError.java to test few arguments and your fix. ------------- Changes requested by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24354#pullrequestreview-2741006399 From coleenp at openjdk.org Thu Apr 3 20:38:56 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Apr 2025 20:38:56 GMT Subject: RFR: 8349007: The jtreg test ResolvedMethodTableHash takes excessive time In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 19:44:46 GMT, Leonid Mesnik wrote: >> This is mostly test change. ResolvedMethodTableHash.java is run /manual but still takes a long time and doesn't really verify that the hash code is any good. This add logging, triggers concurrent work like other ConcurrentHashtables, and checks that a small table is not rehashed. >> Tested with tier1 (including test). > > test/hotspot/jtreg/runtime/MemberName/ResolvedMethodTableHash.java line 52: > >> 50: public static class ResolvedMethodTableHashTest extends ClassLoader { >> 51: // Generate a MethodHandle for ClassName.m() >> 52: private MethodHandle generate(String className) throws ReflectiveOperationException { > > The indentation is wrong in line 53, 60, 70, 71, 96, 102, might be other places also. My indent script was a bit too agressive. > test/hotspot/jtreg/runtime/MemberName/ResolvedMethodTableHash.java line 99: > >> 97: List handles = new ArrayList<>(); >> 98: >> 99: int count = args.length > 0 ? Integer.parseInt(args[0]) : 200000; > > no need to check args and have default 20000, it is not executed, actually. > might be just hardcode 1000 here? I fixed this to hardcode 1001 (want 1001 iterations to print the message). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24383#discussion_r2027705925 PR Review Comment: https://git.openjdk.org/jdk/pull/24383#discussion_r2027705447 From coleenp at openjdk.org Thu Apr 3 20:45:51 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Apr 2025 20:45:51 GMT Subject: RFR: 8349007: The jtreg test ResolvedMethodTableHash takes excessive time In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 17:28:12 GMT, Coleen Phillimore wrote: > This is mostly test change. ResolvedMethodTableHash.java is run /manual but still takes a long time and doesn't really verify that the hash code is any good. This add logging, triggers concurrent work like other ConcurrentHashtables, and checks that a small table is not rehashed. > Tested with tier1 (including test). This test was never meant to be a stress test, it was just checking that the hashcode didn't result in collisions for methods with the same name and signature. With my change it takes 32 seconds to run make test TEST=runtime/MemberName/ResolvedMethodTableHash.java which is probably about 31 seconds for jtreg to run. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24383#issuecomment-2776870921 From coleenp at openjdk.org Thu Apr 3 20:45:50 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Apr 2025 20:45:50 GMT Subject: RFR: 8349007: The jtreg test ResolvedMethodTableHash takes excessive time [v2] In-Reply-To: References: Message-ID: > This is mostly test change. ResolvedMethodTableHash.java is run /manual but still takes a long time and doesn't really verify that the hash code is any good. This add logging, triggers concurrent work like other ConcurrentHashtables, and checks that a small table is not rehashed. > Tested with tier1 (including test). Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix indent and hardcode 1001 loops. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24383/files - new: https://git.openjdk.org/jdk/pull/24383/files/dedbf9f9..346e7a72 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24383&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24383&range=00-01 Stats: 50 lines in 1 file changed: 4 ins; 4 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/24383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24383/head:pull/24383 PR: https://git.openjdk.org/jdk/pull/24383 From coleenp at openjdk.org Thu Apr 3 20:45:52 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Apr 2025 20:45:52 GMT Subject: RFR: 8349007: The jtreg test ResolvedMethodTableHash takes excessive time [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 20:35:28 GMT, Coleen Phillimore wrote: >> test/hotspot/jtreg/runtime/MemberName/ResolvedMethodTableHash.java line 99: >> >>> 97: List handles = new ArrayList<>(); >>> 98: >>> 99: int count = args.length > 0 ? Integer.parseInt(args[0]) : 200000; >> >> no need to check args and have default 20000, it is not executed, actually. >> might be just hardcode 1000 here? > > I fixed this to hardcode 1001 (want 1001 iterations to print the message). This was a good suggestion, thank you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24383#discussion_r2027712900 From lmesnik at openjdk.org Thu Apr 3 22:32:49 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 3 Apr 2025 22:32:49 GMT Subject: RFR: 8352565: Add native method implementation of Reference.get() [v4] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 18:33:16 GMT, Kim Barrett wrote: >> Please review this change which adds a native method providing the >> implementation of Reference::get. Referece::get is an intrinsic candidate, so >> this native method implementation is only used when the intrinsic is not. >> >> Currently there is intrinsic support by the interpreter, C1, C2, and graal, >> which are always used. With this change we can later remove all the >> per-platform interpreter intrinsic implementations, and might also remove the >> C1 intrinsic implementation. >> >> Testing: >> (1) mach5 tier1-6 normal (so using all the existing intrinsics). >> (2) mach5 tier1-6 with interpreter and C1 Reference::get intrinsics disabled. > > Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: > > - remove timeout by using waitForReferenceProcessing > - make ill-timed gc in non-concurrent case less likely > - fix test package use Test changes looks, good. Please get another review before pushing. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24315#pullrequestreview-2741345881 From lmesnik at openjdk.org Thu Apr 3 22:55:53 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 3 Apr 2025 22:55:53 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances [v2] In-Reply-To: <_m74U7bVGssSQnbrkP-4KvS5nga3bg4Bh4g5BlU07Kw=.f951a8cf-a4a6-452f-936e-284decfd9df8@github.com> References: <_m74U7bVGssSQnbrkP-4KvS5nga3bg4Bh4g5BlU07Kw=.f951a8cf-a4a6-452f-936e-284decfd9df8@github.com> Message-ID: <1n0KWxyuHafZFtkM1ByFFpUqWTkeAOVWcRuBv21AU5g=.f4eb6906-8d3b-46e2-b973-cda8a9d7e110@github.com> On Wed, 2 Apr 2025 09:42:05 GMT, Thomas Stuefe wrote: >> In preparation for planned GC performance improvements (KLUT), I would like to reduce the average number of oop map entries. >> >> For details, please see JBS issue text. >> >> ----------------------- >> >> Patch results: >> >> The patch brings a positive change of oop map size, reducing the likelihood of lengthy oop maps. Here the oop map size distribution over all JDK classes in the JDK image: >> >> Before: >> >> 5395 - non-static oop maps (0 entries) >> 9330 - non-static oop maps (1 entries) >> 1449 - non-static oop maps (2 entries) >> 274 - non-static oop maps (3 entries) >> 218 - non-static oop maps (4 entries) >> 75 - non-static oop maps (5 entries) >> 7 - non-static oop maps (6 entries) >> 4 - non-static oop maps (7 entries) >> >> >> Now: >> >> 5395 - non-static oop maps (0 entries) >> 10178 - non-static oop maps (1 entries) >> 933 - non-static oop maps (2 entries) >> 229 - non-static oop maps (3 entries) >> 16 - non-static oop maps (4 entries) >> 1 - non-static oop maps (5 entries) >> >> >> For example, `java.util.concurrent.ConcurrentHashMap$TreeNode` is changed from having 2 entries to having just one entry, which is nice for a class that may be instantiated a lot: >> >> Before: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000000d1dddc0} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'red' 'Z' @28 << derived class starts here, non-oops lead >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 >> - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 >> - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 >> - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @44 >> - non-static oop maps (2 entries): 16-24 32-44 >> >> Now: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000007e1de450} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @28 << class starts here, oop... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - add regression test > - Reworked to use prior super klass layout reconstruction pass > - Merge branch 'master' into JDK-8353273-Reduce-average-number-of-oop-map-entries-in-instance-objects > - alternate-order > - print Changes requested by lmesnik (Reviewer). test/hotspot/jtreg/runtime/FieldLayout/TestOopMapSizeMinimal.java line 34: > 32: /* > 33: * @test id=no_coops_no_ccptr_no_coh > 34: * @library /test/lib / Using "/" as testlibrary is not a good pattern. "/" is not a lib. Why is it needed here? test/hotspot/jtreg/runtime/FieldLayout/TestOopMapSizeMinimal.java line 94: > 92: static { > 93: WhiteBox WB = WhiteBox.getWhiteBox(); > 94: boolean is_64_bit = System.getProperty("sun.arch.data.model").equals("64"); I am a little bit confused with this check and `*` @requires vm.bits == "64"` Shouldn't "sun.arch.data.model" be always 64? ------------- PR Review: https://git.openjdk.org/jdk/pull/24330#pullrequestreview-2741350318 PR Review Comment: https://git.openjdk.org/jdk/pull/24330#discussion_r2027836141 PR Review Comment: https://git.openjdk.org/jdk/pull/24330#discussion_r2027845359 From lmesnik at openjdk.org Thu Apr 3 23:11:49 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 3 Apr 2025 23:11:49 GMT Subject: RFR: 8349007: The jtreg test ResolvedMethodTableHash takes excessive time In-Reply-To: References: Message-ID: <2MUslrY2z5R4fNlJYNJSCG51IdBWkazru51yz-TPO-Y=.5114f24a-44dc-40f8-9124-2bef16e6a427@github.com> On Thu, 3 Apr 2025 20:40:48 GMT, Coleen Phillimore wrote: > This test was never meant to be a stress test, it was just checking that the hashcode didn't result in collisions for methods with the same name and signature. Thanks for explanation. > With my change it takes 32 seconds to run make test TEST=runtime/MemberName/ResolvedMethodTableHash.java which is probably about 31 seconds for jtreg to run. That's fast enough to remove manual tag. I think it is up t o you if you want to run it in tier1 or move into tier2/3. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24383#issuecomment-2777183052 From dholmes at openjdk.org Fri Apr 4 01:20:48 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 4 Apr 2025 01:20:48 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 10:59:16 GMT, Kevin Walls wrote: > We should be consistent, and run all OnError items in a new shell. Currently the ; separator causes a new shell, but multiple -XX:OnError= options are grouped into the same shell. > > next_OnError_command() decides on where a new command starts. It should recognise newlines, and all commands will get their own shell. I am having trouble understanding how the current behaviour can actually work. If I have java -XX:OnError="gcore %p" ... -XX:OnError="ps -fe" ... then we will get a combined `OnError` value of "gcore %p\nps -fe" - which as a single command is nonsense. ??? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24354#issuecomment-2777325939 From dholmes at openjdk.org Fri Apr 4 02:03:51 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 4 Apr 2025 02:03:51 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v3] In-Reply-To: References: Message-ID: <9KQtTCyBbC24n4R_Oz-XO4_5ZZKXJU2hBYenYfg35xU=.263c3419-d326-4115-a128-999df4da632d@github.com> On Thu, 3 Apr 2025 15:47:27 GMT, Matthias Baesken wrote: >> The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. >> SOURCE=".:git:21af8c7e7405" >> Also the MODULES list is probably useful to have. >> Add this info (or the complete content of the release file) to the hs_err files. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use one time PeriodicTask A number of minor suggestions but this looks good to me. Thanks src/hotspot/share/runtime/os.cpp line 1542: > 1540: > 1541: if (_image_release_file_content == nullptr) { > 1542: FILE *file = fopen(release_file, "rb"); Suggestion: FILE* file = fopen(release_file, "rb"); src/hotspot/share/runtime/os.cpp line 1543: > 1541: if (_image_release_file_content == nullptr) { > 1542: FILE *file = fopen(release_file, "rb"); > 1543: if (!file) { Suggestion: if (file != nullptr) { Style: no implicit booleans src/hotspot/share/runtime/os.cpp line 1563: > 1561: > 1562: size_t elements_read = fread(_image_release_file_content, 1, sz, file); > 1563: if (elements_read < (size_t)sz) _image_release_file_content[elements_read] = '\0'; Suggestion: if (elements_read < (size_t)sz) { _image_release_file_content[elements_read] = '\0'; } src/hotspot/share/runtime/os.cpp line 1564: > 1562: size_t elements_read = fread(_image_release_file_content, 1, sz, file); > 1563: if (elements_read < (size_t)sz) _image_release_file_content[elements_read] = '\0'; > 1564: _image_release_file_content[sz] = '\0'; Shouldn't this be in an else? src/hotspot/share/runtime/os.cpp line 1566: > 1564: _image_release_file_content[sz] = '\0'; > 1565: // issues with \r in line endings on Windows, so better replace those > 1566: for (size_t i=0; i < elements_read; i++) { Suggestion: for (size_t i = 0; i < elements_read; i++) { src/hotspot/share/runtime/os.cpp line 1567: > 1565: // issues with \r in line endings on Windows, so better replace those > 1566: for (size_t i=0; i < elements_read; i++) { > 1567: if (_image_release_file_content[i] == '\r') { _image_release_file_content[i] = ' '; } Suggestion: if (_image_release_file_content[i] == '\r') { _image_release_file_content[i] = ' '; } src/hotspot/share/runtime/threads.cpp line 428: > 426: } > 427: > 428: // One-shot PeriodicTask subclass for reading release file Suggestion: // One-shot PeriodicTask subclass for reading the release file. // The "period" of 100 is just an arbitrary initial delay. src/hotspot/share/runtime/threads.cpp line 431: > 429: class ReadReleaseFileTask : public PeriodicTask { > 430: public: > 431: ReadReleaseFileTask(size_t interval_time) : PeriodicTask(interval_time) {} Suggestion: ReadReleaseFileTask() : PeriodicTask(100) {} The "delay" can just be hard-wired here. src/hotspot/share/runtime/threads.cpp line 436: > 434: os::read_image_release_file(); > 435: > 436: // Reclaim our storage and disenroll ourself Suggestion: // Reclaim our storage and disenroll ourself. src/hotspot/share/runtime/threads.cpp line 596: > 594: } > 595: > 596: ReadReleaseFileTask* read_task = new ReadReleaseFileTask(100); Suggestion: // Have the WatcherThread read the release file in the background. ReadReleaseFileTask* read_task = new ReadReleaseFileTask(); src/hotspot/share/utilities/vmError.cpp line 1436: > 1434: st->cr(); > 1435: > 1436: // printing release file content Suggestion: // STEP("printing release file content") ------------- PR Review: https://git.openjdk.org/jdk/pull/24244#pullrequestreview-2741592534 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2027969664 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2027969939 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2027970357 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2027970644 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2027970966 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2027971281 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2027973768 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2027973211 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2027973903 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2027974257 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2027975466 From dholmes at openjdk.org Fri Apr 4 02:03:52 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 4 Apr 2025 02:03:52 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:12:38 GMT, Thomas Stuefe wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> use one time PeriodicTask > > Coming to this discussion late. > > IMHO this is overengineered for just a printout to the hs-err file during error dumping. We already read from proc fs. Proc can be worse (depending on what you read, a lot) than reading sequentially from a flat file. > > Remember that we already run the JVM binaries from the same file system. We read debug information from those binaries during error dumping, and that causes a ton of IO; a sequential read of a tiny file is a drop in the bucket. > > Also remember that we have safety fuses: Step timeouts and Step signal handling - so if this read ever turns out to be a problem, e.g by hanging, the Step would be cancelled and error reporting would continue with the next step. > > I would, however, attempt to avoid malloc. Not super important, but if its easy to do I would do it. Best by using a small fixed-sized stack-allocated buffer, and just printing the file line by line. > > Just my 5 cent. @tstuefe perhap this is over engineered but there are a number of issues to consider: - reading from the physical filesystem is not the same as reading from /proc and IMO is far more likely to be problematic if done in a signal handling context during error reporting - hence we need to read the file ahead-of-time - the content of the file is available and known at build time, but there is reluctance to try and handle this via the build system, so we need to read it at runtime - reading from a small file during VM startup may indeed be lost in the noise but it is "death by a thousand cuts" - can we avoid any impact on startup? Yes we can. Also no need to avoid malloc when just reading via normal code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2777371168 From amitkumar at openjdk.org Fri Apr 4 03:59:48 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 4 Apr 2025 03:59:48 GMT Subject: RFR: 8353274: [PPC64] Bug related to -XX:+UseCompactObjectHeaders -XX:-UseSIGTRAP in JDK-8305895 In-Reply-To: <6GVJHW26mRaRkydXx2pCmSsjdaKsm4ttjKeQ2vnvHG4=.bbf7fff9-fa27-41ef-a4ac-42daba4b890f@github.com> References: <6GVJHW26mRaRkydXx2pCmSsjdaKsm4ttjKeQ2vnvHG4=.bbf7fff9-fa27-41ef-a4ac-42daba4b890f@github.com> Message-ID: On Mon, 31 Mar 2025 14:25:09 GMT, Martin Doerr wrote: > `MacroAssembler::ic_check` compares the `Klass*` in the compact format (no decode). However, a right shift is needed in case of `UseCompactObjectHeaders` (see `load_narrow_klass_compact`). This was missing in the slower version which doesn't use SIGTRAP. LGTM ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/24331#pullrequestreview-2741711501 From dholmes at openjdk.org Fri Apr 4 05:52:57 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 4 Apr 2025 05:52:57 GMT Subject: RFR: 8353365: TOUCH_ASSERT_POISON clears GetLastError() Message-ID: This is a very simple fix to save/restore the "last error" value on Windows, so that the TOUCH_ASSERT_POISON mechanism used in assert/guarantee/fatal, does not clear it. Testing - new Windows-only gtest added to vmErrors test group - tiers 103 sanity Thanks. ------------- Commit messages: - gtest - was easier to create than I had expected - 8353365: TOUCH_ASSERT_POISON clears GetLastError() Changes: https://git.openjdk.org/jdk/pull/24435/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24435&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353365 Stats: 12 lines in 2 files changed: 12 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24435.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24435/head:pull/24435 PR: https://git.openjdk.org/jdk/pull/24435 From stuefe at openjdk.org Fri Apr 4 06:03:53 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 4 Apr 2025 06:03:53 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:12:38 GMT, Thomas Stuefe wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> use one time PeriodicTask > > Coming to this discussion late. > > IMHO this is overengineered for just a printout to the hs-err file during error dumping. We already read from proc fs. Proc can be worse (depending on what you read, a lot) than reading sequentially from a flat file. > > Remember that we already run the JVM binaries from the same file system. We read debug information from those binaries during error dumping, and that causes a ton of IO; a sequential read of a tiny file is a drop in the bucket. > > Also remember that we have safety fuses: Step timeouts and Step signal handling - so if this read ever turns out to be a problem, e.g by hanging, the Step would be cancelled and error reporting would continue with the next step. > > I would, however, attempt to avoid malloc. Not super important, but if its easy to do I would do it. Best by using a small fixed-sized stack-allocated buffer, and just printing the file line by line. > > Just my 5 cent. > @tstuefe perhap this is over engineered but there are a number of issues to consider: > > * reading from the physical filesystem is not the same as reading from /proc and IMO is far more likely to be problematic if done in a signal handling context during error reporting - hence we need to read the file ahead-of-time But we already do exactly that. We read the Elf- and Dwarf-files to print out the symbol and stack information. From the same filesystem. And these are way larger than the release file, and we don't read sequentially, but seek around. This generates a ton of IO. Reading a small file sequentially is fine in comparison. I also don't follow the arguments: the argument for this is that the file system could be slow or IO could be broken in some form. If that is true, it is likely true for the the whole file system. In that case, VM startup would take either a very long time or the VM would not even come up, and I argue that is a configuration error. If only reading the release file is slow while reading binaries is fine (and why would that ever happen?), then doing this for every startup of the JVM is a bad idea since it blocks the task queue, concurrent execution or not. I think if we expect problems reading that file, we should rather read it when we need it, not unconditionally at every VM startup. Plus, in error handling we have safeguards against hanging reads - we cancel hanging error reporting steps. We don't have those safeguards in the normal JVM. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2777632094 From stuefe at openjdk.org Fri Apr 4 06:22:48 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 4 Apr 2025 06:22:48 GMT Subject: RFR: 8353365: TOUCH_ASSERT_POISON clears GetLastError() In-Reply-To: References: Message-ID: <-3XDmIdMsJlbxqCv_UoAbkZbcm2da9K0nt-pPEPfHCw=.ecdacfa2-afe6-42be-809a-1255ea17853c@github.com> On Fri, 4 Apr 2025 05:43:36 GMT, David Holmes wrote: > This is a very simple fix to save/restore the "last error" value on Windows, so that the TOUCH_ASSERT_POISON mechanism used in assert/guarantee/fatal, does not clear it. > > Testing > - new Windows-only gtest added to vmErrors test group > - tiers 103 sanity > > Thanks. Oh, sorry for that. Thanks @stefank @dholmes-ora for finding and fixing this. The patch is fine. I wonder whether we have the same problem on Posix with errno, but I assume we don't call any functions there that modify errno. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24435#pullrequestreview-2741882984 From sspitsyn at openjdk.org Fri Apr 4 06:23:52 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 4 Apr 2025 06:23:52 GMT Subject: RFR: 8316682: serviceability/jvmti/vthread/SelfSuspendDisablerTest timed out [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 17:58:30 GMT, Leonid Mesnik wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> some cleanup > > src/hotspot/share/prims/jvmtiEnv.cpp line 1078: > >> 1076: JvmtiEnv::ResumeThread(jthread thread) { >> 1077: // resume thread with handshake >> 1078: ResumeThreadClosure op(/* single_resume */ true); > > Could you please explain how thread is protected from racing with mounting<->unmounting operations with resume_thread operations? > It might be unlikely happens for suspended threads, but for alive threads the results are not defined. Thank you for the question. The `JvmtiHanshake::execute()` has a `JvmtiVTMSTransitionDisabler` installed: JvmtiHandshake::execute(JvmtiUnitedHandshakeClosure* hs_cl, jthread target) { JavaThread* current = JavaThread::current(); HandleMark hm(current); JvmtiVTMSTransitionDisabler disabler(target); <= !!!!!!! . . . > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1759: > >> 1757: Handle thread_h(current, thread_oop); >> 1758: bool is_virtual = java_lang_VirtualThread::is_instance(thread_h()); >> 1759: bool is_thread_carrying = is_thread_carrying_vthread(java_thread, thread_h()); > > I think that somewhere in this place should be an explanation of suspend<->resume synchronization. As I understand the hadshake can't be executed and clear suspend state while suspend_thread is done for the same thread. How it is guaranteed that suspend_thread flag cann't be updated? > It is not obvious and also put some restrictions on the suspend_thread implementation to keep this behaviour. Thank you for reviewing and this suggestion. Yes, you are right. I'll try to find a good place to add such a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24269#discussion_r2028158825 PR Review Comment: https://git.openjdk.org/jdk/pull/24269#discussion_r2028161088 From dholmes at openjdk.org Fri Apr 4 06:38:48 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 4 Apr 2025 06:38:48 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v3] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 06:00:52 GMT, Thomas Stuefe wrote: > > @tstuefe perhap this is over engineered but there are a number of issues to consider: > > ``` > > * reading from the physical filesystem is not the same as reading from /proc and IMO is far more likely to be problematic if done in a signal handling context during error reporting - hence we need to read the file ahead-of-time > > ``` > > But we already do exactly that. We read the Elf- and Dwarf-files to print out the symbol and stack information. From the same filesystem. And these are way larger than the release file, and we don't read sequentially, but seek around. This generates a ton of IO. Reading a small file sequentially is fine in comparison. Yes we do read those, or attempt to, and doing so is risky and may not work. So do we just keep adding more and risky things to error reporting and keeping hoping it will all "just work"? If we can avoid such a risk, without undue cost/effort shouldn't we do so? > I also don't follow the arguments: the argument for this is that the file system could be slow or IO could be broken in some form. Two different arguments. Reading the file from a signal handling context may not work. Reading the file from disk during startup adds to the startup overhead. Again, why not avoid these issues when there is a simple way to do so? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2777680663 From stuefe at openjdk.org Fri Apr 4 07:32:55 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 4 Apr 2025 07:32:55 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v3] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 06:00:52 GMT, Thomas Stuefe wrote: >> Coming to this discussion late. >> >> IMHO this is overengineered for just a printout to the hs-err file during error dumping. We already read from proc fs. Proc can be worse (depending on what you read, a lot) than reading sequentially from a flat file. >> >> Remember that we already run the JVM binaries from the same file system. We read debug information from those binaries during error dumping, and that causes a ton of IO; a sequential read of a tiny file is a drop in the bucket. >> >> Also remember that we have safety fuses: Step timeouts and Step signal handling - so if this read ever turns out to be a problem, e.g by hanging, the Step would be cancelled and error reporting would continue with the next step. >> >> I would, however, attempt to avoid malloc. Not super important, but if its easy to do I would do it. Best by using a small fixed-sized stack-allocated buffer, and just printing the file line by line. >> >> Just my 5 cent. > >> @tstuefe perhap this is over engineered but there are a number of issues to consider: >> >> * reading from the physical filesystem is not the same as reading from /proc and IMO is far more likely to be problematic if done in a signal handling context during error reporting - hence we need to read the file ahead-of-time > > But we already do exactly that. We read the Elf- and Dwarf-files to print out the symbol and stack information. From the same filesystem. And these are way larger than the release file, and we don't read sequentially, but seek around. This generates a ton of IO. Reading a small file sequentially is fine in comparison. > > I also don't follow the arguments: the argument for this is that the file system could be slow or IO could be broken in some form. > > If that is true, it is likely true for the the whole file system. In that case, VM startup would take either a very long time or the VM would not even come up, and I argue that is a configuration error. > > If only reading the release file is slow while reading binaries is fine (and why would that ever happen?), then doing this for every startup of the JVM is a bad idea since it blocks the task queue, concurrent execution or not. I think if we expect problems reading that file, we should rather read it when we need it, not unconditionally at every VM startup. Plus, in error handling we have safeguards against hanging reads - we cancel hanging error reporting steps. We don't have those safeguards in the normal JVM. > > > @tstuefe perhap this is over engineered but there are a number of issues to consider: > > > ``` > > > * reading from the physical filesystem is not the same as reading from /proc and IMO is far more likely to be problematic if done in a signal handling context during error reporting - hence we need to read the file ahead-of-time > > > ``` > > > > > > But we already do exactly that. We read the Elf- and Dwarf-files to print out the symbol and stack information. From the same filesystem. And these are way larger than the release file, and we don't read sequentially, but seek around. This generates a ton of IO. Reading a small file sequentially is fine in comparison. > > Yes we do read those, or attempt to, and doing so is risky and may not work. So do we just keep adding more and risky things to error reporting and keeping hoping it will all "just work"? If we can avoid such a risk, without undue cost/effort shouldn't we do so? > > > I also don't follow the arguments: the argument for this is that the file system could be slow or IO could be broken in some form. > > Two different arguments. Reading the file from a signal handling context may not work. Reading the file from disk during startup adds to the startup overhead. > > Again, why not avoid these issues when there is a simple way to do so? @dholmes-ora We won't probably convince each other. I don't want to hold up the PR, so I reviewed it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2777795925 From stuefe at openjdk.org Fri Apr 4 07:32:57 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 4 Apr 2025 07:32:57 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 15:47:27 GMT, Matthias Baesken wrote: >> The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. >> SOURCE=".:git:21af8c7e7405" >> Also the MODULES list is probably useful to have. >> Add this info (or the complete content of the release file) to the hs_err files. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use one time PeriodicTask src/hotspot/share/runtime/os.cpp line 1539: > 1537: char* release_file = (char*) os::malloc(rfile_len, mtInternal); > 1538: if (release_file) { > 1539: os::snprintf(release_file, rfile_len, "%s/release", home); Instead of the manual malloc, just do this: stringStream ss; ss.print(""%s/release", home); then you can use `ss.base()` for the assembled path. src/hotspot/share/runtime/os.cpp line 1553: > 1551: return; > 1552: } > 1553: fseek(file, 0, SEEK_SET); There is no need to seek to the end to get the file size. Use fstat instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2028264049 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2028267066 From jwaters at openjdk.org Fri Apr 4 07:36:54 2025 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 4 Apr 2025 07:36:54 GMT Subject: RFR: 8353365: TOUCH_ASSERT_POISON clears GetLastError() In-Reply-To: References: Message-ID: <-TZRm_07hfVkTEu8cbPjhFQwbiORryh4b4NfeBl-1uk=.27cf1669-5c92-42fe-8e47-0900f3c3b1d7@github.com> On Fri, 4 Apr 2025 05:43:36 GMT, David Holmes wrote: > This is a very simple fix to save/restore the "last error" value on Windows, so that the TOUCH_ASSERT_POISON mechanism used in assert/guarantee/fatal, does not clear it. > > Testing > - new Windows-only gtest added to vmErrors test group > - tiers 103 sanity > > Thanks. Simple enough for me to review, so I'll give it a +1, just one trivial question ------------- Marked as reviewed by jwaters (Committer). PR Review: https://git.openjdk.org/jdk/pull/24435#pullrequestreview-2742091778 From jwaters at openjdk.org Fri Apr 4 07:41:54 2025 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 4 Apr 2025 07:41:54 GMT Subject: RFR: 8353365: TOUCH_ASSERT_POISON clears GetLastError() In-Reply-To: References: Message-ID: <2V4fZgBmLPg-nf5SdJUyhrKgU25H1vEMfJLmpvELCTM=.d2c4a209-25db-46dd-93fb-c8567b6738b3@github.com> On Fri, 4 Apr 2025 05:43:36 GMT, David Holmes wrote: > This is a very simple fix to save/restore the "last error" value on Windows, so that the TOUCH_ASSERT_POISON mechanism used in assert/guarantee/fatal, does not clear it. > > Testing > - new Windows-only gtest added to vmErrors test group > - tiers 103 sanity > > Thanks. test/hotspot/gtest/utilities/test_vmerror.cpp line 38: > 36: "fatal error: GetLastError should be 6 - actually: 6") { > 37: SetLastError(6); > 38: fatal("GetLastError should be 6 - actually: %d", (int)GetLastError()); I wonder if HotSpot has a preference for the more specific C++ casts. Also, wouldn't it be better to check the value of GetLastError after fatal is called rather than comparing the strings (I assume that's what the TEST_VM_ASSERT_MSG is doing)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24435#discussion_r2028282409 From kbarrett at openjdk.org Fri Apr 4 07:41:53 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 4 Apr 2025 07:41:53 GMT Subject: RFR: 8353365: TOUCH_ASSERT_POISON clears GetLastError() In-Reply-To: <-3XDmIdMsJlbxqCv_UoAbkZbcm2da9K0nt-pPEPfHCw=.ecdacfa2-afe6-42be-809a-1255ea17853c@github.com> References: <-3XDmIdMsJlbxqCv_UoAbkZbcm2da9K0nt-pPEPfHCw=.ecdacfa2-afe6-42be-809a-1255ea17853c@github.com> Message-ID: On Fri, 4 Apr 2025 06:19:58 GMT, Thomas Stuefe wrote: > Oh, sorry for that. Thanks @stefank @dholmes-ora for finding and fixing this. The patch is fine. > > I wonder whether we have the same problem on Posix with errno, but I assume we don't call any functions there that modify errno. Same problem exists for posix, but we already preserve errno: https://github.com/openjdk/jdk/blob/41d4a0d7bdda2a96af1e7f549c05d99d68c040dc/src/hotspot/os/posix/signals_posix.cpp#L564-L566 That behavior goes back to https://bugs.openjdk.org/browse/JDK-6749267, fixed in jdk8. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24435#issuecomment-2777827385 From kbarrett at openjdk.org Fri Apr 4 07:51:48 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 4 Apr 2025 07:51:48 GMT Subject: RFR: 8353365: TOUCH_ASSERT_POISON clears GetLastError() In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 05:43:36 GMT, David Holmes wrote: > This is a very simple fix to save/restore the "last error" value on Windows, so that the TOUCH_ASSERT_POISON mechanism used in assert/guarantee/fatal, does not clear it. > > Testing > - new Windows-only gtest added to vmErrors test group > - tiers 103 sanity > > Thanks. Changes requested by kbarrett (Reviewer). test/hotspot/gtest/utilities/test_vmerror.cpp line 38: > 36: "fatal error: GetLastError should be 6 - actually: 6") { > 37: SetLastError(6); > 38: fatal("GetLastError should be 6 - actually: %d", (int)GetLastError()); Why is this casting the value of GetLastError, rather than using "%u" in the format string? ------------- PR Review: https://git.openjdk.org/jdk/pull/24435#pullrequestreview-2742118311 PR Review Comment: https://git.openjdk.org/jdk/pull/24435#discussion_r2028292231 From kbarrett at openjdk.org Fri Apr 4 07:51:49 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 4 Apr 2025 07:51:49 GMT Subject: RFR: 8353365: TOUCH_ASSERT_POISON clears GetLastError() In-Reply-To: <2V4fZgBmLPg-nf5SdJUyhrKgU25H1vEMfJLmpvELCTM=.d2c4a209-25db-46dd-93fb-c8567b6738b3@github.com> References: <2V4fZgBmLPg-nf5SdJUyhrKgU25H1vEMfJLmpvELCTM=.d2c4a209-25db-46dd-93fb-c8567b6738b3@github.com> Message-ID: On Fri, 4 Apr 2025 07:37:35 GMT, Julian Waters wrote: >> This is a very simple fix to save/restore the "last error" value on Windows, so that the TOUCH_ASSERT_POISON mechanism used in assert/guarantee/fatal, does not clear it. >> >> Testing >> - new Windows-only gtest added to vmErrors test group >> - tiers 103 sanity >> >> Thanks. > > test/hotspot/gtest/utilities/test_vmerror.cpp line 38: > >> 36: "fatal error: GetLastError should be 6 - actually: 6") { >> 37: SetLastError(6); >> 38: fatal("GetLastError should be 6 - actually: %d", (int)GetLastError()); > > I wonder if HotSpot has a preference for the more specific C++ casts. Also, wouldn't it be better to check the value of GetLastError after fatal is called rather than comparing the strings (I assume that's what the TEST_VM_ASSERT_MSG is doing)? @TheShermanTanker - This test is checking the problem use-case that led to this change, where the value of the `GetLastError` call was clobbered when done as an argument to `fatal`, because the `TOUCH_ASSERT_POISON` happened before the call to `GetLastError`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24435#discussion_r2028296656 From jwaters at openjdk.org Fri Apr 4 07:57:54 2025 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 4 Apr 2025 07:57:54 GMT Subject: RFR: 8353365: TOUCH_ASSERT_POISON clears GetLastError() In-Reply-To: References: <2V4fZgBmLPg-nf5SdJUyhrKgU25H1vEMfJLmpvELCTM=.d2c4a209-25db-46dd-93fb-c8567b6738b3@github.com> Message-ID: On Fri, 4 Apr 2025 07:48:12 GMT, Kim Barrett wrote: >> test/hotspot/gtest/utilities/test_vmerror.cpp line 38: >> >>> 36: "fatal error: GetLastError should be 6 - actually: 6") { >>> 37: SetLastError(6); >>> 38: fatal("GetLastError should be 6 - actually: %d", (int)GetLastError()); >> >> I wonder if HotSpot has a preference for the more specific C++ casts. Also, wouldn't it be better to check the value of GetLastError after fatal is called rather than comparing the strings (I assume that's what the TEST_VM_ASSERT_MSG is doing)? > > @TheShermanTanker - This test is checking the problem use-case that led to this change, where the > value of the `GetLastError` call was clobbered when done as an argument to `fatal`, because the > `TOUCH_ASSERT_POISON` happened before the call to `GetLastError`. Ah, alright. Thanks for the explanation ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24435#discussion_r2028306628 From kevinw at openjdk.org Fri Apr 4 07:58:50 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Fri, 4 Apr 2025 07:58:50 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 01:17:50 GMT, David Holmes wrote: > I am having trouble understanding how the current behaviour can actually work. If I have Well, it works 8-) I can't get sh -c "command\ncommand" to work at the command-line, but strace shows it works when the JVM execs it. e.g. java -XX:CICrashAt=2 -XX:OnError="echo ONE" -XX:OnError="echo TWO" ...program... ... # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # # # -XX:OnError="echo ONE echo TWO" # Executing /bin/sh -c "echo ONE echo TWO" ... ONE TWO Aborted (core dumped) When it logs for the above message, it writes the command ccstrlist it has: 1508906 write(1, "echo ONE\necho TWO", 17) = 17 and only execs one shell: 1508911 execve("/bin/sh", ["sh", "-c", "echo ONE\necho TWO"], 0x7ffeb584b798 /* 70 vars */ ..and nothing else. If I make it run /bin/echo not builtin: -XX:OnError="/bin/echo ONE" -XX:OnError="/bin/echo TWO" I see one shell and two /bin/echo programs run: $ grep exec strace.out 1508926 execve("build/linux-x64/images/jdk/bin/java", ["build/linux-x64/images/jdk/bin/j"..., "-XX:CICrashAt=2", "-XX:OnError=/bin/echo ONE", "-XX:OnError=/bin/echo TWO", "-cp", "/progs", "MyProg"], 0x7ffddf65ff48 /* 70 vars */) = 0 ... 1508950 execve("/bin/sh", ["sh", "-c", "/bin/echo ONE\n/bin/echo TWO"], 0x7ffe2927e608 /* 70 vars */ 1508951 execve("/bin/echo", ["/bin/echo", "ONE"], 0x55f33ff96b60 /* 70 vars */) = 0 1508950 execve("/bin/echo", ["/bin/echo", "TWO"], 0x55f33ff978c0 /* 70 vars */) = 0 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24354#issuecomment-2777859714 From mdoerr at openjdk.org Fri Apr 4 08:02:54 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 4 Apr 2025 08:02:54 GMT Subject: RFR: 8353274: [PPC64] Bug related to -XX:+UseCompactObjectHeaders -XX:-UseSIGTRAP in JDK-8305895 In-Reply-To: <6GVJHW26mRaRkydXx2pCmSsjdaKsm4ttjKeQ2vnvHG4=.bbf7fff9-fa27-41ef-a4ac-42daba4b890f@github.com> References: <6GVJHW26mRaRkydXx2pCmSsjdaKsm4ttjKeQ2vnvHG4=.bbf7fff9-fa27-41ef-a4ac-42daba4b890f@github.com> Message-ID: On Mon, 31 Mar 2025 14:25:09 GMT, Martin Doerr wrote: > `MacroAssembler::ic_check` compares the `Klass*` in the compact format (no decode). However, a right shift is needed in case of `UseCompactObjectHeaders` (see `load_narrow_klass_compact`). This was missing in the slower version which doesn't use SIGTRAP. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24331#issuecomment-2777865475 From mdoerr at openjdk.org Fri Apr 4 08:02:54 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 4 Apr 2025 08:02:54 GMT Subject: Integrated: 8353274: [PPC64] Bug related to -XX:+UseCompactObjectHeaders -XX:-UseSIGTRAP in JDK-8305895 In-Reply-To: <6GVJHW26mRaRkydXx2pCmSsjdaKsm4ttjKeQ2vnvHG4=.bbf7fff9-fa27-41ef-a4ac-42daba4b890f@github.com> References: <6GVJHW26mRaRkydXx2pCmSsjdaKsm4ttjKeQ2vnvHG4=.bbf7fff9-fa27-41ef-a4ac-42daba4b890f@github.com> Message-ID: On Mon, 31 Mar 2025 14:25:09 GMT, Martin Doerr wrote: > `MacroAssembler::ic_check` compares the `Klass*` in the compact format (no decode). However, a right shift is needed in case of `UseCompactObjectHeaders` (see `load_narrow_klass_compact`). This was missing in the slower version which doesn't use SIGTRAP. This pull request has now been integrated. Changeset: a13e34da Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/a13e34da3f81039b572fd6030d6ee63dfffad612 Stats: 23 lines in 2 files changed: 8 ins; 12 del; 3 mod 8353274: [PPC64] Bug related to -XX:+UseCompactObjectHeaders -XX:-UseSIGTRAP in JDK-8305895 Reviewed-by: rrich, amitkumar ------------- PR: https://git.openjdk.org/jdk/pull/24331 From tschatzl at openjdk.org Fri Apr 4 08:10:34 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 4 Apr 2025 08:10:34 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: - * missing file from merge - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq - * make young gen length revising independent of refinement thread * use a service task * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update - * fix IR code generation tests that change due to barrier cost changes - * factor out card table and refinement table merging into a single method - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 - * obsolete G1UpdateBufferSize G1UpdateBufferSize has previously been used to size the refinement buffers and impose a minimum limit on the number of cards per thread that need to be pending before refinement starts. The former function is now obsolete with the removal of the dirty card queues, the latter functionality has been taken over by the new diagnostic option `G1PerThreadPendingCardThreshold`. I prefer to make this a diagnostic option is better than a product option because it is something that is only necessary for some test cases to produce some otherwise unwanted behavior (continuous refinement). CSR is pending. - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=29 Stats: 7089 lines in 110 files changed: 2610 ins; 3555 del; 924 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From stuefe at openjdk.org Fri Apr 4 08:32:52 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 4 Apr 2025 08:32:52 GMT Subject: RFR: 8353365: TOUCH_ASSERT_POISON clears GetLastError() In-Reply-To: References: <2V4fZgBmLPg-nf5SdJUyhrKgU25H1vEMfJLmpvELCTM=.d2c4a209-25db-46dd-93fb-c8567b6738b3@github.com> Message-ID: On Fri, 4 Apr 2025 07:55:25 GMT, Julian Waters wrote: >> @TheShermanTanker - This test is checking the problem use-case that led to this change, where the >> value of the `GetLastError` call was clobbered when done as an argument to `fatal`, because the >> `TOUCH_ASSERT_POISON` happened before the call to `GetLastError`. > > Ah, alright. Thanks for the explanation And `fatal` does not return. It ends the VM with an assertion message. google death tests scan for that message on stderr. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24435#discussion_r2028357519 From ayang at openjdk.org Fri Apr 4 09:12:23 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 4 Apr 2025 09:12:23 GMT Subject: RFR: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 Message-ID: Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. ------------- Commit messages: - tmp - gclocker-nested Changes: https://git.openjdk.org/jdk/pull/24407/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24407&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352116 Stats: 31 lines in 4 files changed: 20 ins; 7 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24407.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24407/head:pull/24407 PR: https://git.openjdk.org/jdk/pull/24407 From eosterlund at openjdk.org Fri Apr 4 09:12:23 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 4 Apr 2025 09:12:23 GMT Subject: RFR: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:40:19 GMT, Albert Mingkun Yang wrote: > Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. > > The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. > > Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. Looks good. Would be nice to refactor the if (UseSerialGC || UseParallelGC) code to something that explains why it's there (those are the GCs that use the new improved GC locker). But that's pre existing so I don't mind if it's split to a separate RFE. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24407#pullrequestreview-2739864515 From stefank at openjdk.org Fri Apr 4 09:12:57 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 4 Apr 2025 09:12:57 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v3] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 07:30:05 GMT, Thomas Stuefe wrote: > Again, why not avoid these issues when there is a simple way to do so? I'm not sure the proposal is simple. There seems to be a race-condition between this new periodic task and the and `jcmd VM.info`. I'm not sure what happens if these two runs at the same time. This can probably be fixed by appropriate synchronization, but that sort-of shows that maybe this isn't a simple solution. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2778025885 From stefank at openjdk.org Fri Apr 4 10:18:53 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 4 Apr 2025 10:18:53 GMT Subject: RFR: 8353637: ZGC: Discontiguous memory reservation is broken on Windows Message-ID: While working on [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) I realized that the current Windows implementation to use placeholders for reservations are broken if we ever fallback to using the part that performs discontiguous heap reservation. To understand this bug you first need to understand how and why we use the placeholder mechanism. From zMapper_windows.cpp: // Memory reservation, commit, views, and placeholders. // // To be able to up-front reserve address space for the heap views, and later // multi-map the heap views to the same physical memory, without ever losing the // reservation of the reserved address space, we use "placeholders". // // These placeholders block out the address space from being used by other parts // of the process. To commit memory in this address space, the placeholder must // be replaced by anonymous memory, or replaced by mapping a view against a // paging file mapping. We use the later to support multi-mapping. // // We want to be able to dynamically commit and uncommit the physical memory of // the heap (and also unmap ZPages), in granules of ZGranuleSize bytes. There is // no way to grow and shrink the committed memory of a paging file mapping. // Therefore, we create multiple granule-sized page file mappings. The memory is // committed by creating a page file mapping, map a view against it, commit the // memory, unmap the view. The memory will stay committed until all views are // unmapped, and the paging file mapping handle is closed. // // When replacing a placeholder address space reservation with a mapped view // against a paging file mapping, the virtual address space must exactly match // an existing placeholder's address and size. Therefore we only deal with // granule-sized placeholders at this layer. Higher layers that keep track of // reserved available address space can (and will) coalesce placeholders, but // they will be split before being used. And the way we implement this is through the callbacks in zVirtualMemory_windows.cpp: // Each reserved virtual memory address area registered in _manager is // exactly covered by a single placeholder. Callbacks are installed so // that whenever a memory area changes, the corresponding placeholder // is adjusted. // // The create and grow callbacks are called when virtual memory is // returned to the memory manager. The new memory area is then covered // by a new single placeholder. // // The destroy and shrink callbacks are called when virtual memory is // allocated from the memory manager. The memory area is then is split // into granule-sized placeholders. // // See comment in zMapper_windows.cpp explaining why placeholders are // split into ZGranuleSize sized placeholders. So, we have the expectation that all memory areas in the memory manager should be covered by exactly one placeholder. We implement that by having the callbacks disabled while we initialize the reserved memory for the heap. This works as long as we get a contiguous memory reservation, and the code has various mechanisms to really try to get contiguous memory for the heap. However, if all those attempts fail, we have a fallback to reserve discontiguous memory. That mode uses interval halving to reserve exactly around the memory that is blocking use from getting a contiguous memory reservation. An example of this would be a request to reserve four "granules" (2MB), but the forth granule is already reserved: +--A--+--B--+--C--+--D--+ ^ D is pre-reserved After failing to reserve the four granules (A, B, C, D), the code will split the range into two halves (A, B) and (C, D), and try to reserve them individually. It will succeed to reserve (A, B) but not (C, D). So, the code registers (A, B) and proceeds to split (C, D) into two parts (C) and (D), and try to reserve them individually. It will succeed with (C) but fail with (D). So, the code registers (C). When (C) is registered, the code sees that (A, B) and (C) are adjacent and fuse them into one region (A, B, C). The problem is that we don't have any callbacks to also fuse the placeholders, so we are left with reservation placeholders over (A, B) and (C). Later one, when we want to use use (A, B) for the heap, the code works under the impression that we have on single placeholder over (A, B, C), so it tries to split that memory are into two placeholders (A, B) and (C). This fails with a fatal error, because Windows will refuse make this split since we already have split the placehold er. The proposal to fix this is to first enable callbacks from the start, before the initializing memory reservation calls are made. And then to change the virtual memory manager to differentiate between the two kinds of insertion (and extraction) operations we have: 1) The first insert operation happens when we "registers" new virtual memory. This is what's done during initialization of ZGC. 2) The other insert operation happens when the system "hands back" memory to the virtual memory manager. The reason why we need to separate these to is that in (1) the memory area has one placeholder that spans the provided memory area, but in (2) we the memory area has a placeholder for every 2MB granule (as described above). So, the patch applies the 'insert' callback for the (2) areas to convert them into looking like (1), and then they both can use the same code to insert the memory into the virtual memory manager. An opposite mechanism is used when "handing out" memory vs de-registering memory for being unreserved. Where the "handing out" operation will perform a 'remove' callback before actually handing out the memory, ensuring that the memory is covered by 2MB placeholders. The shrink and grow operations are relieved of their previous duties to split and coalesce the 2MB placeholders and are now only tasked with splitting memory into two placeholders and combining two placeholders into one. This is handled by the 'grow' and 'shrink' callbacks. To be able to provoke this bug we have written a small gtest, which I think has enough comments to explain what's going on and when things used to break down. The added tests uses enough high-level operations that I also had to add the support to unreserve memory, without it verification code starts to fail. The added unreserve code also introduces calls to NMT to register the releasing of memory. This in turn is problematic because NMT doesn't support releasing a larger memory area than what we previously registered. So, we have a similar problem that we had with the placeholders that the code isn't prepared to have adjacent memory treated as a single memory area. This is going to be fixed by the on-going rewrites to NMT, but for now I've added a workaround to release memory in 2MB chunks, which is supported by the current NMT implementation. I've moved some of the tests in test_zMapper_windows.cpp to the new test_zVirtualMemoryManager.cpp file so that we can run these tests on other platforms as well. ------------- Commit messages: - 8353637: ZGC: Discontiguous memory reservation is broken on Windows Changes: https://git.openjdk.org/jdk/pull/24443/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24443&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353637 Stats: 813 lines in 17 files changed: 524 ins; 192 del; 97 mod Patch: https://git.openjdk.org/jdk/pull/24443.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24443/head:pull/24443 PR: https://git.openjdk.org/jdk/pull/24443 From maurizio.cimadamore at oracle.com Fri Apr 4 10:32:34 2025 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Fri, 4 Apr 2025 11:32:34 +0100 Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) In-Reply-To: References: <1257824626.204034517.1743604437041.JavaMail.zimbra@univ-eiffel.fr> Message-ID: In general I don't disagree. There is, however, at least _some_ cases where the imperative API is less difficult to use. In some cases you know that a class has a complex lifecycle -- perhaps it starts off in a simple "larval" state, where the instance only exist in a single thread. In this state, it's possible for the class to receive state updates. If parts of the class state are stable, using `trySet` might work very well. Perhaps only "friends" of this class can call such mutator methods on the larval instance. At some point later in the class lifetime, it becomes "frozen", and it is published to multiple threads. Of course, this is a corner case -- but if our goal is to model what `@Stable` can do, while surely a stable supplier, or using `orElseSet` are better no-worry alternatives to get there, there remain a number of use cases that would not be expressible if all we had was the high-level API. In a way, a big part of what this new API does is that it finds the right set of primitives, upon which we can build all the other interesting high-level stuff. I think your complaint is not that the primitive is wrong, but that in calling the primitive StableValue we're giving the "good name" to the stuff that is less likely to be widely used. (Note: a very minimalistic API approach -- which we considered -- would have been to just provide extra stable factories in Supplier/Function/IntFunction/List/Map and call it a day) Maurizio On 03/04/2025 12:20, Per-Ake Minborg wrote: > Hi Remi and thank you for the feedback from JChateau?(what a wonderful > name!). > > This is one of the issues we already have on the list for the next > round of preview. Now we know more folks are on the same page. > > Best, Per > ------------------------------------------------------------------------ > *From:* Remi Forax > *Sent:* Wednesday, April 2, 2025 4:33 PM > *To:* Per Minborg > *Cc:* compiler-dev ; core-libs-dev > ; hotspot-dev ; > security-dev > *Subject:* Re: RFR: 8351565: Implement JEP 502: Stable Values (Preview) > Hi Per, > last week, at JChateau, we had a one hour session about stable values, > I've build the JDK with this PR so we can discuss about it. > > To present the API, i start from the double check locking, rewriting > it to use the StableValue API. > > The main remark was that methods like orElseSet() or isSet() are hard > to used correctly. > > In my opinion, the current API is a mix of a high level API and a > low-level API but it's too easy to misuse the low-level API. > > > high level: > - methods supplier(), list() and map() > ? Those are easy to use > > low level: > - methods: of, of(value), orElseSet, setOrThrow(), etc > ? Those are hard to use properly. > > I think, not necessary in this PR, that the current API should be > separated into two different classes, one in java.lang with the high > level API (the static methods other than Of() and one in > java.util.concurrent with the low level API where you have to know > what you are doing (like with any classes of java.util.concurrent). > > regards, > R?mi > > ----- Original Message ----- > > From: "Per Minborg" > > To: "compiler-dev" , "core-libs-dev" > , "hotspot-dev" > > , "security-dev" > > Sent: Thursday, March 13, 2025 12:20:10 PM > > Subject: RFR: 8351565: Implement JEP 502: Stable Values (Preview) > > > Implement JEP 502. > > > > The PR passes tier1-tier3 tests. > > > > ------------- > > > > Commit messages: > > - Use acquire semantics for reading rather than volatile semantics > > - Add missing null check > > - Simplify handling of sentinel, wrap, and unwrap > > - Fix JavaDoc issues > > - Fix members in StableEnumFunction > > - Address some comments in the PR > > - Merge branch 'master' into implement-jep502 > > - Revert change > > - Fix copyright issues > > - Update JEP number > > - ... and 231 more: > https://git.openjdk.org/jdk/compare/4cf63160...09ca44e6 > > > > Changes: https://git.openjdk.org/jdk/pull/23972/files > >? Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23972&range=00 > > >? Issue: https://bugs.openjdk.org/browse/JDK-8351565 > >? Stats: 3980 lines in 30 files changed: 3949 ins; 18 del; 13 mod > >? Patch: https://git.openjdk.org/jdk/pull/23972.diff > >? Fetch: git fetch https://git.openjdk.org/jdk.git > pull/23972/head:pull/23972 > > > > PR: https://git.openjdk.org/jdk/pull/23972 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdoerr at openjdk.org Fri Apr 4 10:39:03 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 4 Apr 2025 10:39:03 GMT Subject: RFR: 8311227: Add .editorconfig so IDEs would pick up the common settings automatically: indent, trim trailing whitespace [v3] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 09:12:45 GMT, David Linus Briemann wrote: >> Add an .editorconfig to define indentation, trim trailing whitespace and open curly brace position for C++ and Java. >> This allows various editors to easily infer basics of the coding style. > > David Linus Briemann has updated the pull request incrementally with one additional commit since the last revision: > > make editorconfig hotspot specific It'd be nice if we could have a default style for .java files in the future, too. This looks good to me for the time being. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23693#pullrequestreview-2742553046 From stefank at openjdk.org Fri Apr 4 12:23:15 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 4 Apr 2025 12:23:15 GMT Subject: RFR: 8353637: ZGC: Discontiguous memory reservation is broken on Windows [v2] In-Reply-To: References: Message-ID: <4SItmLXrqDx4u3hmoip2CnAbubq-XztJLaAh_ZLuSYY=.963ab642-5e0a-4a2b-bbc4-ad19f003eca4@github.com> > While working on [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) I realized that the current Windows implementation to use placeholders for reservations are broken if we ever fallback to using the part that performs discontiguous heap reservation. > > To understand this bug you first need to understand how and why we use the placeholder mechanism. From zMapper_windows.cpp: > > > // Memory reservation, commit, views, and placeholders. > // > // To be able to up-front reserve address space for the heap views, and later > // multi-map the heap views to the same physical memory, without ever losing the > // reservation of the reserved address space, we use "placeholders". > // > // These placeholders block out the address space from being used by other parts > // of the process. To commit memory in this address space, the placeholder must > // be replaced by anonymous memory, or replaced by mapping a view against a > // paging file mapping. We use the later to support multi-mapping. > // > // We want to be able to dynamically commit and uncommit the physical memory of > // the heap (and also unmap ZPages), in granules of ZGranuleSize bytes. There is > // no way to grow and shrink the committed memory of a paging file mapping. > // Therefore, we create multiple granule-sized page file mappings. The memory is > // committed by creating a page file mapping, map a view against it, commit the > // memory, unmap the view. The memory will stay committed until all views are > // unmapped, and the paging file mapping handle is closed. > // > // When replacing a placeholder address space reservation with a mapped view > // against a paging file mapping, the virtual address space must exactly match > // an existing placeholder's address and size. Therefore we only deal with > // granule-sized placeholders at this layer. Higher layers that keep track of > // reserved available address space can (and will) coalesce placeholders, but > // they will be split before being used. > > And the way we implement this is through the callbacks in zVirtualMemory_windows.cpp: > > > // Each reserved virtual memory address area registered in _manager is > // exactly covered by a single placeholder. Callbacks are installed so > // that whenever a memory area changes, the corresponding placeholder > // is adjusted. > // > // The create and grow callbacks are called when virtual memory is > // returned to the memory manager. The new memory area is then covered > // by a new single placeholder. > // > // The destroy and shrink callbacks are called when virtua... Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: More feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24443/files - new: https://git.openjdk.org/jdk/pull/24443/files/754bce11..8066c900 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24443&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24443&range=00-01 Stats: 55 lines in 4 files changed: 10 ins; 6 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/24443.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24443/head:pull/24443 PR: https://git.openjdk.org/jdk/pull/24443 From stefank at openjdk.org Fri Apr 4 12:23:15 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 4 Apr 2025 12:23:15 GMT Subject: RFR: 8353637: ZGC: Discontiguous memory reservation is broken on Windows In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 10:09:57 GMT, Stefan Karlsson wrote: > While working on [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) I realized that the current Windows implementation to use placeholders for reservations are broken if we ever fallback to using the part that performs discontiguous heap reservation. > > To understand this bug you first need to understand how and why we use the placeholder mechanism. From zMapper_windows.cpp: > > > // Memory reservation, commit, views, and placeholders. > // > // To be able to up-front reserve address space for the heap views, and later > // multi-map the heap views to the same physical memory, without ever losing the > // reservation of the reserved address space, we use "placeholders". > // > // These placeholders block out the address space from being used by other parts > // of the process. To commit memory in this address space, the placeholder must > // be replaced by anonymous memory, or replaced by mapping a view against a > // paging file mapping. We use the later to support multi-mapping. > // > // We want to be able to dynamically commit and uncommit the physical memory of > // the heap (and also unmap ZPages), in granules of ZGranuleSize bytes. There is > // no way to grow and shrink the committed memory of a paging file mapping. > // Therefore, we create multiple granule-sized page file mappings. The memory is > // committed by creating a page file mapping, map a view against it, commit the > // memory, unmap the view. The memory will stay committed until all views are > // unmapped, and the paging file mapping handle is closed. > // > // When replacing a placeholder address space reservation with a mapped view > // against a paging file mapping, the virtual address space must exactly match > // an existing placeholder's address and size. Therefore we only deal with > // granule-sized placeholders at this layer. Higher layers that keep track of > // reserved available address space can (and will) coalesce placeholders, but > // they will be split before being used. > > And the way we implement this is through the callbacks in zVirtualMemory_windows.cpp: > > > // Each reserved virtual memory address area registered in _manager is > // exactly covered by a single placeholder. Callbacks are installed so > // that whenever a memory area changes, the corresponding placeholder > // is adjusted. > // > // The create and grow callbacks are called when virtual memory is > // returned to the memory manager. The new memory area is then covered > // by a new single placeholder. > // > // The destroy and shrink callbacks are called when virtua... I got some feedback from @xmas92 and @jsikstro so I've updated the comments and renamed the callbacks to more clearly hint about when and why they are called. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24443#issuecomment-2778531218 From stuefe at openjdk.org Fri Apr 4 13:18:01 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 4 Apr 2025 13:18:01 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v3] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 09:10:00 GMT, Stefan Karlsson wrote: > > Again, why not avoid these issues when there is a simple way to do so? > > I'm not sure the proposal is simple. There seems to be a race-condition between this new periodic task and the and `jcmd VM.info`. I'm not sure what happens if these two runs at the same time. This can probably be fixed by appropriate synchronization, but that sort-of shows that maybe this isn't a simple solution. The same race exists when printing the hs-err file. But at least we have a step crash protection there; with VM.info, we crash the JVM. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2778705984 From ihse at openjdk.org Fri Apr 4 14:06:10 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 4 Apr 2025 14:06:10 GMT Subject: RFR: 8311227: Add .editorconfig so IDEs would pick up the common settings automatically: indent, trim trailing whitespace [v3] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 09:12:45 GMT, David Linus Briemann wrote: >> Add an .editorconfig to define indentation, trim trailing whitespace and open curly brace position for C++ and Java. >> This allows various editors to easily infer basics of the coding style. > > David Linus Briemann has updated the pull request incrementally with one additional commit since the last revision: > > make editorconfig hotspot specific I still don't think the .editorconfig as suggested here is a good idea. It directly conflicts with the existing logic in jcheck. Understand me right -- I am all in favor of tightening the structure of our code base. But we can't do that by introducing an .editorconfig that does not match what is currently enforced by jcheck or is the current standard of the code base. Instead, we need to tighten the rules bit by bit, getting buy-in for tighter rules, and ensuring we update and fix all old files. I have published an alternative implementation of this issue at https://github.com/openjdk/jdk/pull/24448. That version of .editorconfig has a 1-to-1 correspondence to what is checked by jcheck. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23693#issuecomment-2778823752 From mullan at openjdk.org Fri Apr 4 14:29:05 2025 From: mullan at openjdk.org (Sean Mullan) Date: Fri, 4 Apr 2025 14:29:05 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8] In-Reply-To: References: <5WNrv1s7Bp7hLwSVGqoPw9ycCSHK0Zyka65DpAjnB2s=.31243a29-4fbb-4c21-b671-45470d043335@github.com> Message-ID: <5m9xiUkcb41c47vcLKS3kvsK9Jhh1y7PsNRHcffa8ug=.5785cdda-e50e-410a-a139-5554d70bfdff@github.com> On Thu, 3 Apr 2025 18:49:24 GMT, Volodymyr Paprotski wrote: > Done I think: https://bugs.openjdk.org/browse/JDK-8297970 Is this link correct? This issue was fixed in JDK 20. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23719#issuecomment-2778889529 From matsaave at openjdk.org Fri Apr 4 14:29:52 2025 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 4 Apr 2025 14:29:52 GMT Subject: RFR: 8351319: AOT cache support for custom class loaders broken since JDK-8348426 [v2] In-Reply-To: References: Message-ID: <9VLVvd2nUZ4RRgZCfqGnMmFVprPv_8rllrFcf2mipIs=.19542826-c09b-42ad-8692-6960c16c2c25@github.com> On Fri, 14 Mar 2025 01:53:38 GMT, Ioi Lam wrote: >> Since [JDK-8348426](https://bugs.openjdk.org/browse/JDK-8348426) (Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file), the AOT cache no longer contains classes intended for custom class loaders (these are called "unregistered classes" in CDS terminology). >> >> The fix is simple -- we already remember the set of unregistered classes in the AOT configuration file. We just need to add them into the final AOT cache (see changes in finalImageRecipes.cpp). > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8351319-support-for-custom-loaders-missing-since-jdk-8348426 > - 8351319: AOT cache support for custom class loaders broken since JDK-8348426 LGTM! ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23926#pullrequestreview-2743190582 From kevinw at openjdk.org Fri Apr 4 14:33:05 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Fri, 4 Apr 2025 14:33:05 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v2] In-Reply-To: <2tf83vKKccOEHQ8lw6nZ0D8MHX0lzIAu8ZhE0IRVQIM=.630949fd-e71e-4179-a369-87568d10e36f@github.com> References: <2tf83vKKccOEHQ8lw6nZ0D8MHX0lzIAu8ZhE0IRVQIM=.630949fd-e71e-4179-a369-87568d10e36f@github.com> Message-ID: On Thu, 3 Apr 2025 20:02:12 GMT, Leonid Mesnik wrote: > Can you please add new regression test or update runtime/ErrorHandling/TestOnError.java to test few arguments and your fix. Yes, added a test update to check that multiple OnError= options are honoured, either separately or using the ; separator. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24354#issuecomment-2778901284 From kevinw at openjdk.org Fri Apr 4 14:33:04 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Fri, 4 Apr 2025 14:33:04 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v2] In-Reply-To: References: Message-ID: > We should be consistent, and run all OnError items in a new shell. Currently the ; separator causes a new shell, but multiple -XX:OnError= options are grouped into the same shell. > > next_OnError_command() decides on where a new command starts. It should recognise newlines, and all commands will get their own shell. Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: Test udpate - multiple -XX:OnError= ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24354/files - new: https://git.openjdk.org/jdk/pull/24354/files/d9646e3a..880261a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24354&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24354&range=00-01 Stats: 28 lines in 1 file changed: 27 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24354/head:pull/24354 PR: https://git.openjdk.org/jdk/pull/24354 From aph at openjdk.org Fri Apr 4 14:39:04 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 4 Apr 2025 14:39:04 GMT Subject: RFR: 8350126: Regression ~3% on Crypto-ChaCha20Poly1305.encrypt for MacOSX aarch64 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:31:39 GMT, Jamil Nimeh wrote: > This fix addresses a performance regression found on some aarch64 processors, namely the Apple M1, when we moved to a quarter round parallel implementation in JDK-8349106. After making some improvements in the ordering of the instructions in the 20-round loop we found that going back to a block-parallel implementation was faster, but it definitely needed the ordering changes for that to be the case. More importantly, the block parallel implementation with the interleaving turns out to be faster on even those processors that showed improvements when moving to the quarter round parallel implementation. > > There is a spreadsheet attached to the JBS bug that shows 3 different implementations relative to the current (QR-parallel with no interleaving) implementation on 3 different ARM64 processors. Comparative benchmarks can also be found below. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4517: > 4515: bSet[0] = v16; bSet[1] = v17; bSet[2] = v18; bSet[3] = v19; > 4516: cSet[0] = v20; cSet[1] = v21; cSet[2] = v22; cSet[3] = v23; > 4517: dSet[0] = v24; dSet[1] = v25; dSet[2] = v26; dSet[3] = v27; Suggestion: bSet[0] = workSt[4]; bSet[1] = workSt[5]; bSet[2] = workSt[6]; bSet[3] = workSt[7]; cSet[0] = workSt[8]; cSet[1] = workSt[9]; cSet[2] = workSt[10]; cSet[3] = workSt[11]; dSet[0] = workSt[12]; dSet[1] = workSt[13]; dSet[2] = workSt[14]; dSet[3] = workSt[15]; How about something like this? The mapping from index to register, and from specification to implementation, is easier for this reviewer to understand. or maybe (better?) define a function, such that: regs_for_quarter_round(bSet, workSt, 4, 5, 6, 7); regs_for_quarter_round(cSet, workSt, 8, 9, 10, 11); regs_for_quarter_round(dSet, workSt, 12, 13, 14, 15); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24420#discussion_r2028932064 From jsikstro at openjdk.org Fri Apr 4 14:43:54 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 4 Apr 2025 14:43:54 GMT Subject: RFR: 8353637: ZGC: Discontiguous memory reservation is broken on Windows [v2] In-Reply-To: <4SItmLXrqDx4u3hmoip2CnAbubq-XztJLaAh_ZLuSYY=.963ab642-5e0a-4a2b-bbc4-ad19f003eca4@github.com> References: <4SItmLXrqDx4u3hmoip2CnAbubq-XztJLaAh_ZLuSYY=.963ab642-5e0a-4a2b-bbc4-ad19f003eca4@github.com> Message-ID: On Fri, 4 Apr 2025 12:23:15 GMT, Stefan Karlsson wrote: >> While working on [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) I realized that the current Windows implementation to use placeholders for reservations are broken if we ever fallback to using the part that performs discontiguous heap reservation. >> >> To understand this bug you first need to understand how and why we use the placeholder mechanism. From zMapper_windows.cpp: >> >> >> // Memory reservation, commit, views, and placeholders. >> // >> // To be able to up-front reserve address space for the heap views, and later >> // multi-map the heap views to the same physical memory, without ever losing the >> // reservation of the reserved address space, we use "placeholders". >> // >> // These placeholders block out the address space from being used by other parts >> // of the process. To commit memory in this address space, the placeholder must >> // be replaced by anonymous memory, or replaced by mapping a view against a >> // paging file mapping. We use the later to support multi-mapping. >> // >> // We want to be able to dynamically commit and uncommit the physical memory of >> // the heap (and also unmap ZPages), in granules of ZGranuleSize bytes. There is >> // no way to grow and shrink the committed memory of a paging file mapping. >> // Therefore, we create multiple granule-sized page file mappings. The memory is >> // committed by creating a page file mapping, map a view against it, commit the >> // memory, unmap the view. The memory will stay committed until all views are >> // unmapped, and the paging file mapping handle is closed. >> // >> // When replacing a placeholder address space reservation with a mapped view >> // against a paging file mapping, the virtual address space must exactly match >> // an existing placeholder's address and size. Therefore we only deal with >> // granule-sized placeholders at this layer. Higher layers that keep track of >> // reserved available address space can (and will) coalesce placeholders, but >> // they will be split before being used. >> >> And the way we implement this is through the callbacks in zVirtualMemory_windows.cpp: >> >> >> // Each reserved virtual memory address area registered in _manager is >> // exactly covered by a single placeholder. Callbacks are installed so >> // that whenever a memory area changes, the corresponding placeholder >> // is adjusted. >> // >> // The create and grow callbacks are called when virtual memory is >> // returned to the memory manager. The new memory area is then covered >> // by a n... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > More feedback Looks good. ------------- Marked as reviewed by jsikstro (Committer). PR Review: https://git.openjdk.org/jdk/pull/24443#pullrequestreview-2743249693 From vpaprotski at openjdk.org Fri Apr 4 15:17:03 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Fri, 4 Apr 2025 15:17:03 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8] In-Reply-To: <5m9xiUkcb41c47vcLKS3kvsK9Jhh1y7PsNRHcffa8ug=.5785cdda-e50e-410a-a139-5554d70bfdff@github.com> References: <5WNrv1s7Bp7hLwSVGqoPw9ycCSHK0Zyka65DpAjnB2s=.31243a29-4fbb-4c21-b671-45470d043335@github.com> <5m9xiUkcb41c47vcLKS3kvsK9Jhh1y7PsNRHcffa8ug=.5785cdda-e50e-410a-a139-5554d70bfdff@github.com> Message-ID: On Fri, 4 Apr 2025 14:26:30 GMT, Sean Mullan wrote: > > Done I think: https://bugs.openjdk.org/browse/JDK-8297970 > > Is this link correct? This issue was fixed in JDK 20. Sorry.. copy/paste didnt notice.. https://bugs.openjdk.org/browse/JDK-8353670 (also ends in *70!) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23719#issuecomment-2779021494 From lmesnik at openjdk.org Fri Apr 4 16:06:49 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 4 Apr 2025 16:06:49 GMT Subject: RFR: 8349007: The jtreg test ResolvedMethodTableHash takes excessive time [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 20:45:50 GMT, Coleen Phillimore wrote: >> This is mostly test change. ResolvedMethodTableHash.java is run /manual but still takes a long time and doesn't really verify that the hash code is any good. This add logging, triggers concurrent work like other ConcurrentHashtables, and checks that a small table is not rehashed. >> Tested with tier1 (including test). > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix indent and hardcode 1001 loops. Thanks for addressing the comments. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24383#pullrequestreview-2743523733 From jnimeh at openjdk.org Fri Apr 4 16:13:49 2025 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Fri, 4 Apr 2025 16:13:49 GMT Subject: RFR: 8350126: Regression ~3% on Crypto-ChaCha20Poly1305.encrypt for MacOSX aarch64 In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 14:35:59 GMT, Andrew Haley wrote: >> This fix addresses a performance regression found on some aarch64 processors, namely the Apple M1, when we moved to a quarter round parallel implementation in JDK-8349106. After making some improvements in the ordering of the instructions in the 20-round loop we found that going back to a block-parallel implementation was faster, but it definitely needed the ordering changes for that to be the case. More importantly, the block parallel implementation with the interleaving turns out to be faster on even those processors that showed improvements when moving to the quarter round parallel implementation. >> >> There is a spreadsheet attached to the JBS bug that shows 3 different implementations relative to the current (QR-parallel with no interleaving) implementation on 3 different ARM64 processors. Comparative benchmarks can also be found below. > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4517: > >> 4515: bSet[0] = v16; bSet[1] = v17; bSet[2] = v18; bSet[3] = v19; >> 4516: cSet[0] = v20; cSet[1] = v21; cSet[2] = v22; cSet[3] = v23; >> 4517: dSet[0] = v24; dSet[1] = v25; dSet[2] = v26; dSet[3] = v27; > > Suggestion: > > bSet[0] = workSt[4]; bSet[1] = workSt[5]; bSet[2] = workSt[6]; bSet[3] = workSt[7]; > cSet[0] = workSt[8]; cSet[1] = workSt[9]; cSet[2] = workSt[10]; cSet[3] = workSt[11]; > dSet[0] = workSt[12]; dSet[1] = workSt[13]; dSet[2] = workSt[14]; dSet[3] = workSt[15]; > > How about something like this? The mapping from index to register, and from specification to implementation, is easier for this reviewer to understand. > > or maybe (better?) define a function, such that: > > > regs_for_quarter_round(bSet, workSt, 4, 5, 6, 7); > regs_for_quarter_round(cSet, workSt, 8, 9, 10, 11); > regs_for_quarter_round(dSet, workSt, 12, 13, 14, 15); I like either approach, I'll try the function route. Either way it definitely helps make things more clear. Putting this in terms of the workSt indicies maps more closely to how things are described in the RFC. Good call! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24420#discussion_r2029095293 From iklam at openjdk.org Fri Apr 4 21:53:13 2025 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 4 Apr 2025 21:53:13 GMT Subject: RFR: 8351319: AOT cache support for custom class loaders broken since JDK-8348426 [v3] In-Reply-To: References: Message-ID: > Since [JDK-8348426](https://bugs.openjdk.org/browse/JDK-8348426) (Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file), the AOT cache no longer contains classes intended for custom class loaders (these are called "unregistered classes" in CDS terminology). > > The fix is simple -- we already remember the set of unregistered classes in the AOT configuration file. We just need to add them into the final AOT cache (see changes in finalImageRecipes.cpp). Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Avoid duplicated unregistered classes that have the same name - Merge branch 'master' into 8351319-support-for-custom-loaders-missing-since-jdk-8348426 - Merge branch 'master' into 8351319-support-for-custom-loaders-missing-since-jdk-8348426 - 8351319: AOT cache support for custom class loaders broken since JDK-8348426 ------------- Changes: https://git.openjdk.org/jdk/pull/23926/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23926&range=02 Stats: 103 lines in 10 files changed: 81 ins; 0 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/23926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23926/head:pull/23926 PR: https://git.openjdk.org/jdk/pull/23926 From jrose at openjdk.org Fri Apr 4 22:06:52 2025 From: jrose at openjdk.org (John R Rose) Date: Fri, 4 Apr 2025 22:06:52 GMT Subject: RFR: 8351319: AOT cache support for custom class loaders broken since JDK-8348426 [v3] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 21:53:13 GMT, Ioi Lam wrote: >> Since [JDK-8348426](https://bugs.openjdk.org/browse/JDK-8348426) (Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file), the AOT cache no longer contains classes intended for custom class loaders (these are called "unregistered classes" in CDS terminology). >> >> The fix is simple -- we already remember the set of unregistered classes in the AOT configuration file. We just need to add them into the final AOT cache (see changes in finalImageRecipes.cpp). > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Avoid duplicated unregistered classes that have the same name > - Merge branch 'master' into 8351319-support-for-custom-loaders-missing-since-jdk-8348426 > - Merge branch 'master' into 8351319-support-for-custom-loaders-missing-since-jdk-8348426 > - 8351319: AOT cache support for custom class loaders broken since JDK-8348426 Good. I suppose there already tests for the other end of the process, where an unregistered class in the AOT cache is actually used. What are those tests? ------------- Marked as reviewed by jrose (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23926#pullrequestreview-2744184813 From matsaave at openjdk.org Fri Apr 4 23:47:51 2025 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 4 Apr 2025 23:47:51 GMT Subject: RFR: 8351319: AOT cache support for custom class loaders broken since JDK-8348426 [v3] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 21:53:13 GMT, Ioi Lam wrote: >> Since [JDK-8348426](https://bugs.openjdk.org/browse/JDK-8348426) (Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file), the AOT cache no longer contains classes intended for custom class loaders (these are called "unregistered classes" in CDS terminology). >> >> The fix is simple -- we already remember the set of unregistered classes in the AOT configuration file. We just need to add them into the final AOT cache (see changes in finalImageRecipes.cpp). > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Avoid duplicated unregistered classes that have the same name > - Merge branch 'master' into 8351319-support-for-custom-loaders-missing-since-jdk-8348426 > - Merge branch 'master' into 8351319-support-for-custom-loaders-missing-since-jdk-8348426 > - 8351319: AOT cache support for custom class loaders broken since JDK-8348426 Changes look good! ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23926#pullrequestreview-2744283339 From zgu at openjdk.org Sat Apr 5 00:32:05 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Sat, 5 Apr 2025 00:32:05 GMT Subject: RFR: 8353753: Remove unnecessary forward declaration in oop.hpp Message-ID: Please review this trivial cleanup to remove unused forward declarations. ------------- Commit messages: - 8353753: Remove unnecessary forward declaration in oop.hpp Changes: https://git.openjdk.org/jdk/pull/24464/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24464&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353753 Stats: 6 lines in 1 file changed: 0 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24464.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24464/head:pull/24464 PR: https://git.openjdk.org/jdk/pull/24464 From sviswanathan at openjdk.org Sat Apr 5 00:44:56 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 5 Apr 2025 00:44:56 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v13] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:38:34 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Reacting to comment by Sandhya. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 339: > 337: > 338: // levels 2 to 7 are done in 2 batches, by first saving half of the coefficients > 339: // from level 1 into memory, doing all the level 2 to level 7 computations In line number 344 - 347, we seem to be storing all the coefficients from level 1 into memory. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 345: > 343: > 344: store4Xmms(coeffs, 0, xmm0_3, _masm); > 345: store4Xmms(coeffs, 4 * XMMBYTES, xmm4_7, _masm); This seems to be unnecessary store. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 370: > 368: loadPerm(xmm16_19, perms, nttL4PermsIdx, _masm); > 369: loadPerm(xmm12_15, perms, nttL4PermsIdx + 64, _masm); > 370: load4Xmms(xmm24_27, zetas, 4 * 512, _masm); // for level 3 The comment // for level3 is not relevant here and could be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2029437396 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2029578599 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2029583308 From vlivanov at openjdk.org Sat Apr 5 02:42:38 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 5 Apr 2025 02:42:38 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API Message-ID: Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. The patch consists of the following parts: * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) Thanks! ------------- Commit messages: - Misc fixes and cleanups - CPU features support - Cleanup - TODO list - SVML fixes - Update templates - fixes - SLEEF improvements - cleanup - VectorMathLib: Migrate to lambdas - ... and 3 more: https://git.openjdk.org/jdk/compare/9fcb06f9...fc27aee5 Changes: https://git.openjdk.org/jdk/pull/24462/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353786 Stats: 1274 lines in 43 files changed: 825 ins; 393 del; 56 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From liach at openjdk.org Sat Apr 5 02:42:38 2025 From: liach at openjdk.org (Chen Liang) Date: Sat, 5 Apr 2025 02:42:38 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 22:52:24 GMT, Vladimir Ivanov wrote: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Moving vector API library selection to Java code looks like a right step to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24462#issuecomment-2779896358 From kbarrett at openjdk.org Sat Apr 5 06:21:47 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 5 Apr 2025 06:21:47 GMT Subject: RFR: 8353753: Remove unnecessary forward declaration in oop.hpp In-Reply-To: References: Message-ID: On Sat, 5 Apr 2025 00:27:05 GMT, Zhengyu Gu wrote: > Please review this trivial cleanup to remove unused forward declarations. Looks good, and trivial. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24464#pullrequestreview-2744699521 From kbarrett at openjdk.org Sat Apr 5 06:29:47 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 5 Apr 2025 06:29:47 GMT Subject: RFR: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:40:19 GMT, Albert Mingkun Yang wrote: > Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. > > The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. > > Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24407#pullrequestreview-2744728350 From duke at openjdk.org Sat Apr 5 14:29:28 2025 From: duke at openjdk.org (Zihao Lin) Date: Sat, 5 Apr 2025 14:29:28 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v6] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'openjdk:master' into 8344116 - Fix build - Fix test failed - 8344116: C2: remove slice parameter from LoadNode::make ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/a1924c35..3efb1c17 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=04-05 Stats: 28443 lines in 792 files changed: 18710 ins; 7734 del; 1999 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From zgu at openjdk.org Sat Apr 5 20:27:55 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Sat, 5 Apr 2025 20:27:55 GMT Subject: RFR: 8353753: Remove unnecessary forward declaration in oop.hpp In-Reply-To: References: Message-ID: On Sat, 5 Apr 2025 00:27:05 GMT, Zhengyu Gu wrote: > Please review this trivial cleanup to remove unused forward declarations. Thanks, @kimbarrett ------------- PR Comment: https://git.openjdk.org/jdk/pull/24464#issuecomment-2781082245 From zgu at openjdk.org Sat Apr 5 20:27:56 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Sat, 5 Apr 2025 20:27:56 GMT Subject: Integrated: 8353753: Remove unnecessary forward declaration in oop.hpp In-Reply-To: References: Message-ID: On Sat, 5 Apr 2025 00:27:05 GMT, Zhengyu Gu wrote: > Please review this trivial cleanup to remove unused forward declarations. This pull request has now been integrated. Changeset: 6d37e633 Author: Zhengyu Gu URL: https://git.openjdk.org/jdk/commit/6d37e633e6afa11ecd40bed10c0efbde6f9f6181 Stats: 6 lines in 1 file changed: 0 ins; 5 del; 1 mod 8353753: Remove unnecessary forward declaration in oop.hpp Reviewed-by: kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/24464 From duke at openjdk.org Sun Apr 6 06:09:05 2025 From: duke at openjdk.org (Zihao Lin) Date: Sun, 6 Apr 2025 06:09:05 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v6] In-Reply-To: References: Message-ID: On Sat, 5 Apr 2025 14:29:28 GMT, Zihao Lin wrote: >> This patch remove slice parameter from LoadNode::make >> >> Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 >> >> Hi team, I am new, I'd appreciate any guidance. Thank a lot! > > Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into 8344116 > - Fix build > - Fix test failed > - 8344116: C2: remove slice parameter from LoadNode::make Hi @TobiHartmann , Could you please take a look? Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24258#issuecomment-2781240184 From dholmes at openjdk.org Sun Apr 6 22:33:51 2025 From: dholmes at openjdk.org (David Holmes) Date: Sun, 6 Apr 2025 22:33:51 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v2] In-Reply-To: References: Message-ID: <1-4UyxBJtfsFcAhapfIUNuzKGPy6JEloGeYdFwKnGvk=.8072f704-6d9f-435f-8e35-409c33627659@github.com> On Fri, 4 Apr 2025 14:33:04 GMT, Kevin Walls wrote: >> We should be consistent, and run all OnError items in a new shell. Currently the ; separator causes a new shell, but multiple -XX:OnError= options are grouped into the same shell. >> >> next_OnError_command() decides on where a new command starts. It should recognise newlines, and all commands will get their own shell. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > Test udpate - multiple -XX:OnError= Thanks Kevin. Checking the bash manpage it states that newline is a control operator, hence it does work, on non-Windows at least. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24354#issuecomment-2781703863 From dholmes at openjdk.org Sun Apr 6 22:41:49 2025 From: dholmes at openjdk.org (David Holmes) Date: Sun, 6 Apr 2025 22:41:49 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v2] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 14:33:04 GMT, Kevin Walls wrote: >> We should be consistent, and run all OnError items in a new shell. Currently the ; separator causes a new shell, but multiple -XX:OnError= options are grouped into the same shell. >> >> next_OnError_command() decides on where a new command starts. It should recognise newlines, and all commands will get their own shell. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > Test udpate - multiple -XX:OnError= Okay I'm satisfied this change is not likely to cause any problems for existing usages. One comment suggestion, but approval in advance. Thanks. src/hotspot/share/utilities/vmError.cpp line 149: > 147: > 148: // skip leading blanks, ';' or newlines > 149: while (*cmd == ' ' || *cmd == ';' || *cmd == '\n') cmd++; It may be worth reinstating a comment in the function description to explain how the command is actually parsed (we seem to have lost that somewhere along the line) e.g. // The command string is expected to be a semi-colon, or newline, delineated sequence of commands, // that are executed sequentially and in their own shell environment. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24354#pullrequestreview-2745295405 PR Review Comment: https://git.openjdk.org/jdk/pull/24354#discussion_r2030291361 From dholmes at openjdk.org Sun Apr 6 22:50:28 2025 From: dholmes at openjdk.org (David Holmes) Date: Sun, 6 Apr 2025 22:50:28 GMT Subject: RFR: 8353365: TOUCH_ASSERT_POISON clears GetLastError() [v2] In-Reply-To: References: Message-ID: > This is a very simple fix to save/restore the "last error" value on Windows, so that the TOUCH_ASSERT_POISON mechanism used in assert/guarantee/fatal, does not clear it. > > Testing > - new Windows-only gtest added to vmErrors test group > - tiers 103 sanity > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Adjust format specifier and remove cast ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24435/files - new: https://git.openjdk.org/jdk/pull/24435/files/c080c068..9a03b3bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24435&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24435&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24435.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24435/head:pull/24435 PR: https://git.openjdk.org/jdk/pull/24435 From dholmes at openjdk.org Sun Apr 6 22:50:28 2025 From: dholmes at openjdk.org (David Holmes) Date: Sun, 6 Apr 2025 22:50:28 GMT Subject: RFR: 8353365: TOUCH_ASSERT_POISON clears GetLastError() [v2] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 07:44:53 GMT, Kim Barrett wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Adjust format specifier and remove cast > > test/hotspot/gtest/utilities/test_vmerror.cpp line 38: > >> 36: "fatal error: GetLastError should be 6 - actually: 6") { >> 37: SetLastError(6); >> 38: fatal("GetLastError should be 6 - actually: %d", (int)GetLastError()); > > Why is this casting the value of GetLastError, rather than using "%u" in the format string? Simply because when I initially was testing this I used a call that had `%d` as the "template" - see e.g. perfMemory_windows.cpp. But for consistency with other uses in os_windows.cpp I should be using `%lu` and no cast. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24435#discussion_r2030292688 From aboldtch at openjdk.org Mon Apr 7 05:15:49 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 7 Apr 2025 05:15:49 GMT Subject: RFR: 8353637: ZGC: Discontiguous memory reservation is broken on Windows [v2] In-Reply-To: <4SItmLXrqDx4u3hmoip2CnAbubq-XztJLaAh_ZLuSYY=.963ab642-5e0a-4a2b-bbc4-ad19f003eca4@github.com> References: <4SItmLXrqDx4u3hmoip2CnAbubq-XztJLaAh_ZLuSYY=.963ab642-5e0a-4a2b-bbc4-ad19f003eca4@github.com> Message-ID: On Fri, 4 Apr 2025 12:23:15 GMT, Stefan Karlsson wrote: >> While working on [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) I realized that the current Windows implementation to use placeholders for reservations are broken if we ever fallback to using the part that performs discontiguous heap reservation. >> >> To understand this bug you first need to understand how and why we use the placeholder mechanism. From zMapper_windows.cpp: >> >> >> // Memory reservation, commit, views, and placeholders. >> // >> // To be able to up-front reserve address space for the heap views, and later >> // multi-map the heap views to the same physical memory, without ever losing the >> // reservation of the reserved address space, we use "placeholders". >> // >> // These placeholders block out the address space from being used by other parts >> // of the process. To commit memory in this address space, the placeholder must >> // be replaced by anonymous memory, or replaced by mapping a view against a >> // paging file mapping. We use the later to support multi-mapping. >> // >> // We want to be able to dynamically commit and uncommit the physical memory of >> // the heap (and also unmap ZPages), in granules of ZGranuleSize bytes. There is >> // no way to grow and shrink the committed memory of a paging file mapping. >> // Therefore, we create multiple granule-sized page file mappings. The memory is >> // committed by creating a page file mapping, map a view against it, commit the >> // memory, unmap the view. The memory will stay committed until all views are >> // unmapped, and the paging file mapping handle is closed. >> // >> // When replacing a placeholder address space reservation with a mapped view >> // against a paging file mapping, the virtual address space must exactly match >> // an existing placeholder's address and size. Therefore we only deal with >> // granule-sized placeholders at this layer. Higher layers that keep track of >> // reserved available address space can (and will) coalesce placeholders, but >> // they will be split before being used. >> >> And the way we implement this is through the callbacks in zVirtualMemory_windows.cpp: >> >> >> // Each reserved virtual memory address area registered in _manager is >> // exactly covered by a single placeholder. Callbacks are installed so >> // that whenever a memory area changes, the corresponding placeholder >> // is adjusted. >> // >> // The create and grow callbacks are called when virtual memory is >> // returned to the memory manager. The new memory area is then covered >> // by a n... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > More feedback lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24443#pullrequestreview-2745536763 From pminborg at openjdk.org Mon Apr 7 06:47:11 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 7 Apr 2025 06:47:11 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API In-Reply-To: References: Message-ID: <9BCE8xN6SA-cPEc1EtuSsqoYwsHiwp31lJKsraWgYso=.67a97434-ef3c-40ab-b5be-841889fdd97c@github.com> On Fri, 4 Apr 2025 22:52:24 GMT, Vladimir Ivanov wrote: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 258: > 256: if (LIBRARY.isSupported(op, vspecies)) { > 257: String symbol = LIBRARY.symbolName(op, vspecies); > 258: MemorySegment addr = LOOKUP.find(symbol) It is better to use `LOOKUP.findOrThrow()` because it does not require lambda creation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2030551872 From duke at openjdk.org Mon Apr 7 07:11:52 2025 From: duke at openjdk.org (David Linus Briemann) Date: Mon, 7 Apr 2025 07:11:52 GMT Subject: RFR: 8311227: Add .editorconfig so IDEs would pick up the common settings automatically: indent, trim trailing whitespace [v3] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 09:12:45 GMT, David Linus Briemann wrote: >> Add an .editorconfig to define indentation, trim trailing whitespace and open curly brace position for C++ and Java. >> This allows various editors to easily infer basics of the coding style. > > David Linus Briemann has updated the pull request incrementally with one additional commit since the last revision: > > make editorconfig hotspot specific I have to respectfully disagree with your assessment for the following reasons: - The editorconfig as defined in this PR follows the hotspot style guide and the indentation settings only apply to hotspot code. - How does it conflict with the jcheck rules? The rules defined in this editorconfig are stricter than the jcheck rules. So jcheck could not find issues after the editorconfig rules were applied. - Providing an editorconfig as you proposed in #24448 is not a good idea in my opinion. Only defining whitespace trimming will not provide real benefit to anyone. However the existing editorconfig file would conflict with locally defined ones and cause problems for developers using these files to define project specific formatting. So no editorconfig would be better than one representing only a small part of the core formatting rules. I also would ask about `jcheck`. Where and how is it used. The only information I found are: and the config file in `jdk/.jcheck`. Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/23693#issuecomment-2782242008 From rehn at openjdk.org Mon Apr 7 07:12:57 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 7 Apr 2025 07:12:57 GMT Subject: RFR: 8351949: RISC-V: Cleanup and enable store-load peephole for membars [v10] In-Reply-To: References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Thu, 3 Apr 2025 17:04:02 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: >> >> - Merge branch 'master' into tso-merge >> - Merge branch 'master' into tso-merge >> - Merge branch 'master' into tso-merge >> - Merge branch 'master' into tso-merge >> - format comment >> - Merge branch 'master' into tso-merge >> - Review comments >> - Merge branch 'master' into tso-merge >> - Review comments >> - Fixed ws >> - ... and 3 more: https://git.openjdk.org/jdk/compare/6da028b2...2044cf5f > > Marked as reviewed by mli (Reviewer). Thanks @Hamlin-Li ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24035#issuecomment-2782241807 From rehn at openjdk.org Mon Apr 7 07:12:57 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 7 Apr 2025 07:12:57 GMT Subject: Integrated: 8351949: RISC-V: Cleanup and enable store-load peephole for membars In-Reply-To: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> References: <7wzwmKl5HbINOI0s0d-_Cyy0M88H3Lt3QXosng1fGSU=.a2853d1c-18e0-428d-a5a6-c45da60e7e38@github.com> Message-ID: On Thu, 13 Mar 2025 13:49:32 GMT, Robbin Ehn wrote: > Hi please consider. > > |RVWMO| Patched| > | ---------- | ---------- | > |fence iorw,iorw| fence iorw,ow| > |sw t4,120(t2) | sw t4,120(t2) | > |fence ow,ir | unnecessary_membar_volatile_rvwmo | > | sw t6,128(t2) // Non-volatile | sw t6,128(t2) // Non-volatile | > |fence iorw,ow | fence iorw,ow| > |sw t5,124(t2) |sw t5,124(t2) | > > |TSO | Patched| > | ---------- | ---------- | > | lw a4,120(t2) | lw a6,120(t2) | > | sw a0,124(t2) | sw t6,124(t2) | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > | sw t4,120(t2) | sw t4,120(t2) | > | fence ow,ir | unnecessary_membar_volatile_tso | > | sw t6,128(t2) | sw t5,128(t2) | > | sw t5,124(t2) // Non-volatile| sw a1,124(t2) // Non-volatile | > | fence iorw,iorw | unnecessary_membar_volatile_tso | > |... | ... | > | sw a3,120(t2) | sw a0,120(t2) | > | fence ow,ir | fence ow,ir | > | lw a7,124(t2) | lw a5,124(t2) | > > For the specific rvwmo volatile store + store + volatile store is around 30% faster on VF2. > > The patch do: > - Separate ztso and rvwmo in ad by using UseZtso predicate. > - Match all that requires the same membar. > - Make fence/fencei protected as they shouldn't be using directly. > - Increased cost of membars to VOLATILE_REF_COST. > - Added a real_empty pipe. > - Change to pipe_slow on TSO (as x86). > > Note that C2-rv64 is now superior to gcc/clang regrading fencing: > https://godbolt.org/z/6E3YTP15j > > Testing jcstress, tier1 and manually reading the generated assembly. > Doing additional testing, but RFR it now as it may need some consideration. > > /Robbin This pull request has now been integrated. Changeset: 6d9ece73 Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/6d9ece73a96dd32fccf4a740205407a76dcd907a Stats: 147 lines in 4 files changed: 71 ins; 27 del; 49 mod 8351949: RISC-V: Cleanup and enable store-load peephole for membars Reviewed-by: fyang, fjiang, mli ------------- PR: https://git.openjdk.org/jdk/pull/24035 From shade at openjdk.org Mon Apr 7 07:48:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Apr 2025 07:48:51 GMT Subject: RFR: 8353572: x86: AMD platforms miss the check for CLWB feature flag In-Reply-To: References: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> Message-ID: On Thu, 3 Apr 2025 20:00:05 GMT, Vladimir Ivanov wrote: >> Noticed this when doing [JDK-8353558](https://bugs.openjdk.org/browse/JDK-8353558). We only check for CLWB feature flag for Intel platforms. But AMD APM (Rev. 3.36?March 2024) tells me there is a CLWB flag in CPUID Fn0000_0007_EBX_x0 leaf as well. It is in the same place as the flag for Intel. >> >> Additional testing: >> - [x] Ad-hoc tests on Ryzen 5950X > > src/hotspot/cpu/x86/vm_version_x86.cpp line 3100: > >> 3098: if (ext_cpuid1_ecx.bits.sse4a != 0) >> 3099: result |= CPU_SSE4A; >> 3100: if (sef_cpuid7_ebx.bits.clwb != 0) > > I'm curious what's the rule here when it comes to vendor-specific features? > > From what I'm seeing in the sources, both AMD and ZX enumerate only `ext_cpuid1` features while for Intel it's a mix of `sef_cpuid7` and `ext_cpuid1`. > > So, I'm curious whether the code should be moved up and shared for all CPUs. I believe we are being very conservative here. CPUID info is very vendor-specific, so we only trust the bits when the relevant vendor docs tell us it is trustworthy. For AMD, I can see the exact spec in AMD Programmer Manual that makes me trust the bit. I have no information if ZX can be trusted to have the same bit at the same location. And, I have no way to test it :) So this CPUID-bit checking should be in AMD-specific block. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24385#discussion_r2030644340 From tschatzl at openjdk.org Mon Apr 7 07:55:52 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 7 Apr 2025 07:55:52 GMT Subject: RFR: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:40:19 GMT, Albert Mingkun Yang wrote: > Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. > > The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. > > Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24407#pullrequestreview-2745840662 From sroy at openjdk.org Mon Apr 7 08:23:56 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Mon, 7 Apr 2025 08:23:56 GMT Subject: RFR: JDK-8331859 : [PPC64] Remove support for Power7 and older In-Reply-To: References: Message-ID: <-IDumV-vCJZ1UbAB91vfGAgZZ0nRJPuCP9EjZxOhExc=.07fd4fd9-03e5-4408-87a3-a6eabf1be724@github.com> On Wed, 2 Apr 2025 20:36:38 GMT, Martin Doerr wrote: >> JBS Issue: [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859) >> Linux PPC64le requires Power8 since the beginning. >> AIX requires Power8 with the new OpenXL based build ([JDK-8307520](https://bugs.openjdk.org/browse/JDK-8307520)). The old build has been removed in JDK 23 ([JDK-8327701](https://bugs.openjdk.org/browse/JDK-8327701)). >> Linux PPC64 Big Endian is no longer officially supported (only kept alive for development, debugging and testing purposes). >> >> The following checks for old processors are no longer needed: >> 8: VM_Version::has_lqarx() >> 7: VM_Version::has_popcntw() >> 6: VM_Version::has_cmpb() >> 5: VM_Version::has_popcntb() >> These ones and some more checks for old instructions are no longer needed. All code which is no longer reachable when removing them should also get removed. >> Checks like "PowerArchitecturePPC64 >= 8" (or older) can be removed. >> >> Atomic::PlatformCmpxchg<1>::operator() can be simplified by using sub-word instructions (lharx, lbarx). >> >> Temp registers can be removed from cmpxchgb and cmpxchgh. >> >> Build flags "-mcpu=powerpc64 -mtune=power5" for Big Endian linux should get replaced by "-mcpu=power8 -mtune=power8" as already used for linux PPC64le. > > src/hotspot/cpu/ppc/vm_version_ppc.hpp line 113: > >> 111: static bool has_fcfids() { return (_features & fcfids_m) != 0; } >> 112: static bool has_vand() { return (_features & vand_m) != 0; } >> 113: static bool has_lqarx() { return (_features & lqarx_m) != 0; } > > Why are the other Power7 and older instruction checks not removed? Hi @TheRealMDoerr I removed the instructions mentioned in the issue. How can I determine which instructions were older ? Is there a file where it is mentioned specifically ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20262#discussion_r2030708128 From mdoerr at openjdk.org Mon Apr 7 08:43:57 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 7 Apr 2025 08:43:57 GMT Subject: RFR: JDK-8331859 : [PPC64] Remove support for Power7 and older In-Reply-To: <-IDumV-vCJZ1UbAB91vfGAgZZ0nRJPuCP9EjZxOhExc=.07fd4fd9-03e5-4408-87a3-a6eabf1be724@github.com> References: <-IDumV-vCJZ1UbAB91vfGAgZZ0nRJPuCP9EjZxOhExc=.07fd4fd9-03e5-4408-87a3-a6eabf1be724@github.com> Message-ID: On Mon, 7 Apr 2025 08:21:03 GMT, Suchismith Roy wrote: >> src/hotspot/cpu/ppc/vm_version_ppc.hpp line 113: >> >>> 111: static bool has_fcfids() { return (_features & fcfids_m) != 0; } >>> 112: static bool has_vand() { return (_features & vand_m) != 0; } >>> 113: static bool has_lqarx() { return (_features & lqarx_m) != 0; } >> >> Why are the other Power7 and older instruction checks not removed? > > Hi @TheRealMDoerr I removed the instructions mentioned in the issue. How can I determine which instructions were older ? Is there a file where it is mentioned specifically ? The Power ISA has a section "Appendix E. Power ISA Instruction Set Sorted by Opcode" at the end. We require Power8 which matches "Power ISA v2.07". All instructions from v2.07 or older are available. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20262#discussion_r2030745558 From mgronlun at openjdk.org Mon Apr 7 09:02:05 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 7 Apr 2025 09:02:05 GMT Subject: RFR: 8352251: Implement Cooperative JFR Sampling [v3] In-Reply-To: References: <2FEvWJYZrD5yRsmTCqrgR9Lit84szuFJxqwdpjghVog=.7deabc88-039f-423e-a4bd-e36399870273@github.com> Message-ID: On Fri, 28 Mar 2025 21:24:15 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> This is the implementation of JEP [JDK-8350338 Cooperative JFR Sampling](https://bugs.openjdk.org/browse/JDK-8350338). >> >> Implementations in this change set are provided and have been tested on the following platforms: >> >> - windows-x64 >> - windows-x64-debug >> - linux-x64 >> - linux-x64-debug >> - macosx-x64 >> - macosx-x64-debug >> - linux-aarch64 >> - linux-aarch64-debug >> - macosx-aarch64 >> - macosx-aarch64-debug >> >> Testing: tier1-6, jdk_jfr, stress testing. >> >> Platform porters note: >> Some platform-specific code needs to be provided, mainly in the interpreter. Take a look at the following files for changes: >> >> - src/hotspot/cpu/x86/frame_x86.cpp >> - src/hotspot/cpu/x86/interp_masm_x86.cpp >> - src/hotspot/cpu/x86/interp_masm_x86.hpp >> - src/hotspot/cpu/x86/javaFrameAnchor_x86.hpp >> - src/hotspot/cpu/x86/macroAssembler_x86.cpp >> - src/hotspot/cpu/x86/macroAssembler_x86.hpp >> - src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp >> - src/hotspot/cpu/x86/templateTable_x86.cpp >> - src/hotspot/os_cpu/linux_x86/javaThread_linux_x86.hpp >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: > > - align params > - adjustments Friendly reminder/heads-up to platform and port maintainers: Please review the necessary platform changes in advance so that your port will be ready once this integration is complete. Alternatively, you can send me your change sets for them to be incorporated into this PR. Thanks Markus ------------- PR Comment: https://git.openjdk.org/jdk/pull/24296#issuecomment-2782524969 From ayang at openjdk.org Mon Apr 7 09:19:03 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 7 Apr 2025 09:19:03 GMT Subject: RFR: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:40:19 GMT, Albert Mingkun Yang wrote: > Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. > > The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. > > Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24407#issuecomment-2782605636 From ayang at openjdk.org Mon Apr 7 09:19:03 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 7 Apr 2025 09:19:03 GMT Subject: Integrated: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:40:19 GMT, Albert Mingkun Yang wrote: > Using a new lock (`JNICritical_lock`) in `GCLocker::block` to resolve a deadlock issue. The root cause of the deadlock is that holding `Heap_lock` while waiting in `GCLocker::block` is unsafe. > > The new lock is held from the start of `GCLocker::block` to the end of `GCLocker::unblock`. This requires adjusting `Heap_lock`'s rank to allow acquiring `Heap_lock` while holding `JNICritical_lock`. The most important changes are in `gcVMOperations.cpp` and `mutexLocker.cpp`. > > Test: tier1-8; verified failure can be observed 2/2000 and pass 8000 iterations. This pull request has now been integrated. Changeset: 39549f89 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/39549f89905019fa90dd20ff8b6822c1351cbaa6 Stats: 31 lines in 4 files changed: 20 ins; 7 del; 4 mod 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647 Reviewed-by: kbarrett, tschatzl, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/24407 From jkern at openjdk.org Mon Apr 7 09:58:50 2025 From: jkern at openjdk.org (Joachim Kern) Date: Mon, 7 Apr 2025 09:58:50 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 15:27:15 GMT, Robert Toyonaga wrote: >> ### Summary: >> This PR makes memory operations atomic with NMT accounting. >> >> ### The problem: >> In memory related functions like `os::commit_memory` and `os::reserve_memory` the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in `MemTracker`, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races. >> >> 1.1 Thread_1 releases range_A. >> 1.2 Thread_1 tells NMT "range_A has been released". >> >> 2.1 Thread_2 reserves (the now free) range_A. >> 2.2 Thread_2 tells NMT "range_A is reserved". >> >> Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS. >> >> ### Solution: >> Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself. >> >> ### Other notes: >> I also simplified this pattern found in many places: >> >> if (MemTracker::enabled()) { >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_some_operation(addr, bytes); >> if (result != nullptr) { >> MemTracker::record_some_operation(addr, bytes); >> } >> } else { >> result = pd_unmap_memory(addr, bytes); >> } >> ``` >> To: >> >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_unmap_memory(addr, bytes); >> MemTracker::record_some_operation(addr, bytes); >> ``` >> This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. >> >> I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section. However, I found that the OS-specific "pd_" functions are already short and to-the-point, so doing this wasn't reducing the lock scope very much. Instead it just makes the code more messy by having to maintain the locking and NMT accounting in each platform specific i... > > Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision: > > exclude file mapping tests on AIX. I ran the tests over the weekend again and now they passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2782761982 From ihse at openjdk.org Mon Apr 7 10:04:04 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 7 Apr 2025 10:04:04 GMT Subject: RFR: 8311227: Add .editorconfig so IDEs would pick up the common settings automatically: indent, trim trailing whitespace [v3] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 07:08:51 GMT, David Linus Briemann wrote: > * How does it conflict with the jcheck rules? The rules defined in this editorconfig are stricter than the jcheck rules. So jcheck could not find issues after the editorconfig rules were applied. No, it's not. The rules you propose here are stricter. You say: [*] trim_trailing_whitespace = true but there is not nor have ever been such a rule for all text files in the JDK repo. This would trigger an enormous amount of spurious changes. In contrast, my suggested PR applies this only to the subset of files where we do in fact have a rule of no trailing whitespaces. > I also would ask about `jcheck`. Where and how is it used. The only information I found are: > https://openjdk.org/projects/code-tools/jcheck/ and the config file in `jdk/.jcheck`. jcheck is run automatically by the Skara bots on all PRs. If jcheck reports an error (that is, a violation of enforced style rules), the PR will not be possible to integrate. Users can also run jcheck locally using the `git skara jcheck` command, if they have the Skara git tools installed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23693#issuecomment-2782776621 From ihse at openjdk.org Mon Apr 7 10:10:57 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 7 Apr 2025 10:10:57 GMT Subject: RFR: 8311227: Add .editorconfig so IDEs would pick up the common settings automatically: indent, trim trailing whitespace [v3] In-Reply-To: References: Message-ID: <6BPm2Ex4oJhMn9hVRU1ulGxySbaF2Tt822tlcwx43VY=.8ca2eb48-3732-460d-a4e1-c91e45bce28f@github.com> On Mon, 7 Apr 2025 07:08:51 GMT, David Linus Briemann wrote: > * The editorconfig as defined in this PR follows the hotspot style guide and the indentation settings only apply to hotspot code. Yes, you have a good point there. I included your `src/hotspot/.editorconfig` in my PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23693#issuecomment-2782796610 From shade at openjdk.org Mon Apr 7 10:12:49 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Apr 2025 10:12:49 GMT Subject: RFR: 8352251: Implement Cooperative JFR Sampling [v3] In-Reply-To: References: <2FEvWJYZrD5yRsmTCqrgR9Lit84szuFJxqwdpjghVog=.7deabc88-039f-423e-a4bd-e36399870273@github.com> Message-ID: On Fri, 28 Mar 2025 20:09:37 GMT, Aleksey Shipilev wrote: >> Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: >> >> - align params >> - adjustments > > src/hotspot/cpu/x86/interp_masm_x86.cpp line 1045: > >> 1043: Label slow_path; >> 1044: Label fast_path; >> 1045: safepoint_poll(slow_path, rthread, current_fp, true /* at_return */, false /* in_nmethod */); > > A little heads-up: I am going to propose a little cleanup soon to drop `rthread` from x86 safepoint_pool (we can trust it is `r15_thread` always). That would probably yield a minor merge conflict here. FYI, it would be here: https://github.com/openjdk/jdk/pull/24323 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24296#discussion_r2030922573 From kbarrett at openjdk.org Mon Apr 7 10:19:53 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 7 Apr 2025 10:19:53 GMT Subject: RFR: 8353365: TOUCH_ASSERT_POISON clears GetLastError() [v2] In-Reply-To: References: Message-ID: <5Y-CejJ4jN2QzwKqXUKmHjZqqGsCyuJk2EKtZEnaKXU=.b19901ea-9586-422e-8865-8951ed271119@github.com> On Sun, 6 Apr 2025 22:50:28 GMT, David Holmes wrote: >> This is a very simple fix to save/restore the "last error" value on Windows, so that the TOUCH_ASSERT_POISON mechanism used in assert/guarantee/fatal, does not clear it. >> >> Testing >> - new Windows-only gtest added to vmErrors test group >> - tiers 103 sanity >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Adjust format specifier and remove cast Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24435#pullrequestreview-2746305010 From kevinw at openjdk.org Mon Apr 7 10:44:02 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 7 Apr 2025 10:44:02 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v3] In-Reply-To: References: Message-ID: > We should be consistent, and run all OnError items in a new shell. Currently the ; separator causes a new shell, but multiple -XX:OnError= options are grouped into the same shell. > > next_OnError_command() decides on where a new command starts. It should recognise newlines, and all commands will get their own shell. Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24354/files - new: https://git.openjdk.org/jdk/pull/24354/files/880261a9..a1966780 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24354&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24354&range=01-02 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24354/head:pull/24354 PR: https://git.openjdk.org/jdk/pull/24354 From kevinw at openjdk.org Mon Apr 7 10:44:03 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 7 Apr 2025 10:44:03 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v2] In-Reply-To: References: Message-ID: On Sun, 6 Apr 2025 22:38:22 GMT, David Holmes wrote: >> Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: >> >> Test udpate - multiple -XX:OnError= > > src/hotspot/share/utilities/vmError.cpp line 149: > >> 147: >> 148: // skip leading blanks, ';' or newlines >> 149: while (*cmd == ' ' || *cmd == ';' || *cmd == '\n') cmd++; > > It may be worth reinstating a comment in the function description to explain how the command is actually parsed (we seem to have lost that somewhere along the line) e.g. > > // The command string is expected to be a semi-colon, or newline, delineated sequence of commands, > // that are executed sequentially and in their own shell environment. Thanks David - yes, trying a new comment to introduce next_OnError_command more clearly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24354#discussion_r2030969685 From mgronlun at openjdk.org Mon Apr 7 10:53:32 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 7 Apr 2025 10:53:32 GMT Subject: RFR: 8352251: Implement Cooperative JFR Sampling [v4] In-Reply-To: <2FEvWJYZrD5yRsmTCqrgR9Lit84szuFJxqwdpjghVog=.7deabc88-039f-423e-a4bd-e36399870273@github.com> References: <2FEvWJYZrD5yRsmTCqrgR9Lit84szuFJxqwdpjghVog=.7deabc88-039f-423e-a4bd-e36399870273@github.com> Message-ID: > Greetings, > > This is the implementation of JEP [JDK-8350338 Cooperative JFR Sampling](https://bugs.openjdk.org/browse/JDK-8350338). > > Implementations in this change set are provided and have been tested on the following platforms: > > - windows-x64 > - windows-x64-debug > - linux-x64 > - linux-x64-debug > - macosx-x64 > - macosx-x64-debug > - linux-aarch64 > - linux-aarch64-debug > - macosx-aarch64 > - macosx-aarch64-debug > > Testing: tier1-6, jdk_jfr, stress testing. > > Platform porters note: > Some platform-specific code needs to be provided, mainly in the interpreter. Take a look at the following files for changes: > > - src/hotspot/cpu/x86/frame_x86.cpp > - src/hotspot/cpu/x86/interp_masm_x86.cpp > - src/hotspot/cpu/x86/interp_masm_x86.hpp > - src/hotspot/cpu/x86/javaFrameAnchor_x86.hpp > - src/hotspot/cpu/x86/macroAssembler_x86.cpp > - src/hotspot/cpu/x86/macroAssembler_x86.hpp > - src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp > - src/hotspot/cpu/x86/templateTable_x86.cpp > - src/hotspot/os_cpu/linux_x86/javaThread_linux_x86.hpp > > Thanks > Markus Markus Gr?nlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into 8352251 - align params - adjustments - refactoring - 8352251 ------------- Changes: https://git.openjdk.org/jdk/pull/24296/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24296&range=03 Stats: 3203 lines in 77 files changed: 1897 ins; 949 del; 357 mod Patch: https://git.openjdk.org/jdk/pull/24296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24296/head:pull/24296 PR: https://git.openjdk.org/jdk/pull/24296 From eosterlund at openjdk.org Mon Apr 7 11:34:57 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 7 Apr 2025 11:34:57 GMT Subject: RFR: 8353637: ZGC: Discontiguous memory reservation is broken on Windows [v2] In-Reply-To: <4SItmLXrqDx4u3hmoip2CnAbubq-XztJLaAh_ZLuSYY=.963ab642-5e0a-4a2b-bbc4-ad19f003eca4@github.com> References: <4SItmLXrqDx4u3hmoip2CnAbubq-XztJLaAh_ZLuSYY=.963ab642-5e0a-4a2b-bbc4-ad19f003eca4@github.com> Message-ID: On Fri, 4 Apr 2025 12:23:15 GMT, Stefan Karlsson wrote: >> While working on [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) I realized that the current Windows implementation to use placeholders for reservations are broken if we ever fallback to using the part that performs discontiguous heap reservation. >> >> To understand this bug you first need to understand how and why we use the placeholder mechanism. From zMapper_windows.cpp: >> >> >> // Memory reservation, commit, views, and placeholders. >> // >> // To be able to up-front reserve address space for the heap views, and later >> // multi-map the heap views to the same physical memory, without ever losing the >> // reservation of the reserved address space, we use "placeholders". >> // >> // These placeholders block out the address space from being used by other parts >> // of the process. To commit memory in this address space, the placeholder must >> // be replaced by anonymous memory, or replaced by mapping a view against a >> // paging file mapping. We use the later to support multi-mapping. >> // >> // We want to be able to dynamically commit and uncommit the physical memory of >> // the heap (and also unmap ZPages), in granules of ZGranuleSize bytes. There is >> // no way to grow and shrink the committed memory of a paging file mapping. >> // Therefore, we create multiple granule-sized page file mappings. The memory is >> // committed by creating a page file mapping, map a view against it, commit the >> // memory, unmap the view. The memory will stay committed until all views are >> // unmapped, and the paging file mapping handle is closed. >> // >> // When replacing a placeholder address space reservation with a mapped view >> // against a paging file mapping, the virtual address space must exactly match >> // an existing placeholder's address and size. Therefore we only deal with >> // granule-sized placeholders at this layer. Higher layers that keep track of >> // reserved available address space can (and will) coalesce placeholders, but >> // they will be split before being used. >> >> And the way we implement this is through the callbacks in zVirtualMemory_windows.cpp: >> >> >> // Each reserved virtual memory address area registered in _manager is >> // exactly covered by a single placeholder. Callbacks are installed so >> // that whenever a memory area changes, the corresponding placeholder >> // is adjusted. >> // >> // The create and grow callbacks are called when virtual memory is >> // returned to the memory manager. The new memory area is then covered >> // by a n... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > More feedback Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24443#pullrequestreview-2746486259 From stefank at openjdk.org Mon Apr 7 11:34:57 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 7 Apr 2025 11:34:57 GMT Subject: RFR: 8353637: ZGC: Discontiguous memory reservation is broken on Windows [v2] In-Reply-To: <4SItmLXrqDx4u3hmoip2CnAbubq-XztJLaAh_ZLuSYY=.963ab642-5e0a-4a2b-bbc4-ad19f003eca4@github.com> References: <4SItmLXrqDx4u3hmoip2CnAbubq-XztJLaAh_ZLuSYY=.963ab642-5e0a-4a2b-bbc4-ad19f003eca4@github.com> Message-ID: On Fri, 4 Apr 2025 12:23:15 GMT, Stefan Karlsson wrote: >> While working on [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) I realized that the current Windows implementation to use placeholders for reservations are broken if we ever fallback to using the part that performs discontiguous heap reservation. >> >> To understand this bug you first need to understand how and why we use the placeholder mechanism. From zMapper_windows.cpp: >> >> >> // Memory reservation, commit, views, and placeholders. >> // >> // To be able to up-front reserve address space for the heap views, and later >> // multi-map the heap views to the same physical memory, without ever losing the >> // reservation of the reserved address space, we use "placeholders". >> // >> // These placeholders block out the address space from being used by other parts >> // of the process. To commit memory in this address space, the placeholder must >> // be replaced by anonymous memory, or replaced by mapping a view against a >> // paging file mapping. We use the later to support multi-mapping. >> // >> // We want to be able to dynamically commit and uncommit the physical memory of >> // the heap (and also unmap ZPages), in granules of ZGranuleSize bytes. There is >> // no way to grow and shrink the committed memory of a paging file mapping. >> // Therefore, we create multiple granule-sized page file mappings. The memory is >> // committed by creating a page file mapping, map a view against it, commit the >> // memory, unmap the view. The memory will stay committed until all views are >> // unmapped, and the paging file mapping handle is closed. >> // >> // When replacing a placeholder address space reservation with a mapped view >> // against a paging file mapping, the virtual address space must exactly match >> // an existing placeholder's address and size. Therefore we only deal with >> // granule-sized placeholders at this layer. Higher layers that keep track of >> // reserved available address space can (and will) coalesce placeholders, but >> // they will be split before being used. >> >> And the way we implement this is through the callbacks in zVirtualMemory_windows.cpp: >> >> >> // Each reserved virtual memory address area registered in _manager is >> // exactly covered by a single placeholder. Callbacks are installed so >> // that whenever a memory area changes, the corresponding placeholder >> // is adjusted. >> // >> // The create and grow callbacks are called when virtual memory is >> // returned to the memory manager. The new memory area is then covered >> // by a n... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > More feedback Thanks for the reviews! I've run this through most of tier1-7 testing on Windows. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24443#issuecomment-2783009856 From stefank at openjdk.org Mon Apr 7 11:34:58 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 7 Apr 2025 11:34:58 GMT Subject: Integrated: 8353637: ZGC: Discontiguous memory reservation is broken on Windows In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 10:09:57 GMT, Stefan Karlsson wrote: > While working on [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) I realized that the current Windows implementation to use placeholders for reservations are broken if we ever fallback to using the part that performs discontiguous heap reservation. > > To understand this bug you first need to understand how and why we use the placeholder mechanism. From zMapper_windows.cpp: > > > // Memory reservation, commit, views, and placeholders. > // > // To be able to up-front reserve address space for the heap views, and later > // multi-map the heap views to the same physical memory, without ever losing the > // reservation of the reserved address space, we use "placeholders". > // > // These placeholders block out the address space from being used by other parts > // of the process. To commit memory in this address space, the placeholder must > // be replaced by anonymous memory, or replaced by mapping a view against a > // paging file mapping. We use the later to support multi-mapping. > // > // We want to be able to dynamically commit and uncommit the physical memory of > // the heap (and also unmap ZPages), in granules of ZGranuleSize bytes. There is > // no way to grow and shrink the committed memory of a paging file mapping. > // Therefore, we create multiple granule-sized page file mappings. The memory is > // committed by creating a page file mapping, map a view against it, commit the > // memory, unmap the view. The memory will stay committed until all views are > // unmapped, and the paging file mapping handle is closed. > // > // When replacing a placeholder address space reservation with a mapped view > // against a paging file mapping, the virtual address space must exactly match > // an existing placeholder's address and size. Therefore we only deal with > // granule-sized placeholders at this layer. Higher layers that keep track of > // reserved available address space can (and will) coalesce placeholders, but > // they will be split before being used. > > And the way we implement this is through the callbacks in zVirtualMemory_windows.cpp: > > > // Each reserved virtual memory address area registered in _manager is > // exactly covered by a single placeholder. Callbacks are installed so > // that whenever a memory area changes, the corresponding placeholder > // is adjusted. > // > // The create and grow callbacks are called when virtual memory is > // returned to the memory manager. The new memory area is then covered > // by a new single placeholder. > // > // The destroy and shrink callbacks are called when virtua... This pull request has now been integrated. Changeset: 6ab1647a Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/6ab1647af2d83427215f3a704671f113ba9845e2 Stats: 817 lines in 17 files changed: 528 ins; 192 del; 97 mod 8353637: ZGC: Discontiguous memory reservation is broken on Windows Co-authored-by: Axel Boldt-Christmas Reviewed-by: jsikstro, aboldtch, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/24443 From adinn at openjdk.org Mon Apr 7 12:41:57 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 7 Apr 2025 12:41:57 GMT Subject: RFR: 8349721: Add aarch64 intrinsics for ML-KEM [v6] In-Reply-To: References: Message-ID: On Sun, 23 Mar 2025 17:00:43 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merged master. > - Fixed bad assertion. > - Fixed mismerge. > - Merged master. > - A little cleanup > - Merged master > - removing trailing spaces > - kyber aarch64 intrinsics @ferakocz Thanks for another very good piece of work which appears to me to be functioning correctly and performantly. The PR suffers from the same problems as the original ML_DSA one i.e. The mapping of data to registers and the overall structure of the generated code and its relation to the related Java code/the original algorithms will be hard for a maintainer to identify. I have reworked your patch to use vector sequences in this [draft PR](https://github.com/openjdk/jdk/pull/24419) in very much the same way as was done for the ML_DSA PR. This has significantly abstracted and clarified the register mappings that are in use in each kyber generator and has also made the higher level structure of the generated code much easier to follow. Note that my rework of the generation routines was applied to your original PR after rebasing it on master. Before updating the kyber routines I also generalized a few of the VSeq methods that benefit from being shared by both kyber and dilithium, most notably the montmul routines, and I added a few extra helpers. The reworked version passes the ML_KEM functional test and gives similar performance improvements for the ML_KEM micro benchmark. The generated code does differ in a few places from what your original patch generates but only superficially - most notable is that a few loads/stores that rely on continued post-increments in the original instead use a constant offset or an add/load pair in the reworked code. This makes a very minor difference to code size and does not seem to affect performance. I would like you to rework your PR to incorporate these changes because I believe it will make a big difference to maintainability. n.b. it may be easier to integrate my changes by diffing your branch and mine and applying the resulting change set rather than trying to merge the changes. Please let me know if you have problems with the integration and need help. I still have some further review comments and would also like to see more commenting to explain what the code is doing. However, I think it will be easier to do that after this rework has been integrated into your PR. ------------- PR Review: https://git.openjdk.org/jdk/pull/23663#pullrequestreview-2746672860 From syan at openjdk.org Mon Apr 7 13:14:37 2025 From: syan at openjdk.org (SendaoYan) Date: Mon, 7 Apr 2025 13:14:37 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 [v2] In-Reply-To: References: Message-ID: <8py7v6gQd9nfucRK28h2enHWk21PevhMa74FYx3bRsI=.663fa24c-4218-426b-bda4-2e8d6ecb02aa@github.com> > Hi all, > > This PR will try to fix memory leak after JDK-8352184. which re-do [JDK-8352184](https://bugs.openjdk.org/browse/JDK-8352184) using the original, purely static uses of the various description strings, as [~jiangli] had proposed. > > Additional testing: > > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-x64 > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 > - [x] full `java -version` tests, the test shell script show as below. > > [JDK-8353189.sh.txt](https://github.com/user-attachments/files/19632637/JDK-8353189.sh.txt) SendaoYan has updated the pull request incrementally with two additional commits since the last revision: - Use static static string instead of assemble string dynamic - Revert "8353189: [ASAN] memory leak after 8352184" This reverts commit 71bc3ad34ebd57cc6642dfede18cec65e3694dd1. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24299/files - new: https://git.openjdk.org/jdk/pull/24299/files/71bc3ad3..9d039fd5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24299&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24299&range=00-01 Stats: 58 lines in 4 files changed: 17 ins; 26 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/24299.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24299/head:pull/24299 PR: https://git.openjdk.org/jdk/pull/24299 From dholmes at openjdk.org Mon Apr 7 13:14:37 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 7 Apr 2025 13:14:37 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 16:40:44 GMT, SendaoYan wrote: > Hi all, > > This PR will try to fix memory leak after JDK-8352184. which re-do [JDK-8352184](https://bugs.openjdk.org/browse/JDK-8352184) using the original, purely static uses of the various description strings, as [~jiangli] had proposed. > > Additional testing: > > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-x64 > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 > - [x] full `java -version` tests, the test shell script show as below. > > [JDK-8353189.sh.txt](https://github.com/user-attachments/files/19632637/JDK-8353189.sh.txt) You can't just free the result as it may not have always been malloc'd. The intention/expectation was that this was a one-off allocation in VM_version_string that was never freed. I was also going to suggest caching the vm_info string as it should be the same all the time. I think you have discovered a bug in the way the info is being requested before arguments (like Xcomp) have been processed. That would cause the wrong info string to be recorded by the early callers. I think I should file a separate bug to deal with the problem that the info string can be used before its true value is actually known. After looking into the details ([JDK-8353595](https://bugs.openjdk.org/browse/JDK-8353595)) I don't think there is any choice but to re-do [JDK-8352184](https://bugs.openjdk.org/browse/JDK-8352184) using the original, purely static uses of the various description strings, as [~jiangli] had proposed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24299#issuecomment-2764934750 PR Comment: https://git.openjdk.org/jdk/pull/24299#issuecomment-2771197885 PR Comment: https://git.openjdk.org/jdk/pull/24299#issuecomment-2771572452 PR Comment: https://git.openjdk.org/jdk/pull/24299#issuecomment-2781692889 From zgu at openjdk.org Mon Apr 7 13:14:37 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 7 Apr 2025 13:14:37 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 16:40:44 GMT, SendaoYan wrote: > Hi all, > > This PR will try to fix memory leak after JDK-8352184. which re-do [JDK-8352184](https://bugs.openjdk.org/browse/JDK-8352184) using the original, purely static uses of the various description strings, as [~jiangli] had proposed. > > Additional testing: > > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-x64 > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 > - [x] full `java -version` tests, the test shell script show as below. > > [JDK-8353189.sh.txt](https://github.com/user-attachments/files/19632637/JDK-8353189.sh.txt) Maybe you want to cache version string, just as `vm_release` string [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/abstract_vm_version.cpp#L32). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24299#issuecomment-2767611672 From syan at openjdk.org Mon Apr 7 13:14:37 2025 From: syan at openjdk.org (SendaoYan) Date: Mon, 7 Apr 2025 13:14:37 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 In-Reply-To: References: Message-ID: <-62qCTgxKqSmcodbKhikAKW2poPBX7NKnim5gBqY_Cg=.ac9e7222-8b75-4022-9c7c-322722b66806@github.com> On Mon, 31 Mar 2025 23:02:05 GMT, Zhengyu Gu wrote: > Maybe you want to cache version string, just as `vm_release` string [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/abstract_vm_version.cpp#L32). Thanks, I will try it later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24299#issuecomment-2767782370 From syan at openjdk.org Mon Apr 7 13:14:37 2025 From: syan at openjdk.org (SendaoYan) Date: Mon, 7 Apr 2025 13:14:37 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 In-Reply-To: <-62qCTgxKqSmcodbKhikAKW2poPBX7NKnim5gBqY_Cg=.ac9e7222-8b75-4022-9c7c-322722b66806@github.com> References: <-62qCTgxKqSmcodbKhikAKW2poPBX7NKnim5gBqY_Cg=.ac9e7222-8b75-4022-9c7c-322722b66806@github.com> Message-ID: On Tue, 1 Apr 2025 01:19:42 GMT, SendaoYan wrote: > Maybe you want to cache version string, just as `vm_release` string [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/abstract_vm_version.cpp#L32). Below patch is try to cache version string, this patch will fix the memory leak error, but can't get the right version with -Xcomp mode. Before this PR, jvm call function `Abstract_VM_Version::vm_info_string()` at the third times then get `compiled mode`. After this PR, jvm will only cache the version string at the first time. So cache version string seems can't solve this memory leak issue. diff --git a/src/hotspot/share/runtime/abstract_vm_version.cpp b/src/hotspot/share/runtime/abstract_vm_version.cpp index 763e441fe54..cd81d89c31f 100644 --- a/src/hotspot/share/runtime/abstract_vm_version.cpp +++ b/src/hotspot/share/runtime/abstract_vm_version.cpp @@ -31,6 +31,7 @@ const char* Abstract_VM_Version::_s_vm_release = Abstract_VM_Version::vm_release(); const char* Abstract_VM_Version::_s_internal_vm_info_string = Abstract_VM_Version::internal_vm_info_string(); +const char* Abstract_VM_Version::_s_vm_info_string = Abstract_VM_Version::vm_info_string(); uint64_t Abstract_VM_Version::_features = 0; const char* Abstract_VM_Version::_features_string = ""; diff --git a/src/hotspot/share/runtime/abstract_vm_version.hpp b/src/hotspot/share/runtime/abstract_vm_version.hpp index 8cfc7031f97..0b8a3c662d0 100644 --- a/src/hotspot/share/runtime/abstract_vm_version.hpp +++ b/src/hotspot/share/runtime/abstract_vm_version.hpp @@ -50,6 +50,9 @@ class Abstract_VM_Version: AllStatic { friend class VMStructs; friend class JVMCIVMStructs; + public: + static const char* _s_vm_info_string; + protected: static const char* _s_vm_release; static const char* _s_internal_vm_info_string; diff --git a/src/hotspot/share/runtime/arguments.cpp b/src/hotspot/share/runtime/arguments.cpp index 8de6f427c3f..dae2199e738 100644 --- a/src/hotspot/share/runtime/arguments.cpp +++ b/src/hotspot/share/runtime/arguments.cpp @@ -390,7 +390,7 @@ void Arguments::init_system_properties() { PropertyList_add(&_system_properties, new SystemProperty("jdk.debug", VM_Version::jdk_debug_level(), false)); // Initialize the vm.info now, but it will need updating after argument parsing. - _vm_info = new SystemProperty("java.vm.info", VM_Version::vm_info_string(), true); + _vm_info = new SystemProperty("java.vm.info", VM_Version::_s_vm_info_string, true); // Following are JVMTI agent writable properties. // Properties values are set to nullptr and they are @@ -1326,7 +1326,7 @@ void Arguments::set_mode_flags(Mode mode) { // Ensure Agent_OnLoad has the correct initial values. // This may not be the final mode; mode may change later in onload phase. PropertyList_unique_add(&_system_properties, "java.vm.info", - VM_Version::vm_info_string(), AddProperty, UnwriteableProperty, ExternalProperty); + VM_Version::_s_vm_info_string, AddProperty, UnwriteableProperty, ExternalProperty); UseInterpreter = true; UseCompiler = true; diff --git a/src/hotspot/share/runtime/threads.cpp b/src/hotspot/share/runtime/threads.cpp index 32859bf2718..319e238ee5f 100644 --- a/src/hotspot/share/runtime/threads.cpp +++ b/src/hotspot/share/runtime/threads.cpp @@ -651,7 +651,7 @@ jint Threads::create_vm(JavaVMInitArgs* args, bool* canTryAgain) { // is initially computed. See Abstract_VM_Version::vm_info_string(). // This update must happen before we initialize the java classes, but // after any initialization logic that might modify the flags. - Arguments::update_vm_info_property(VM_Version::vm_info_string()); + Arguments::update_vm_info_property(VM_Version::_s_vm_info_string); JavaThread* THREAD = JavaThread::current(); // For exception macros. HandleMark hm(THREAD); @@ -1334,7 +1334,7 @@ void Threads::print_on(outputStream* st, bool print_stacks, st->print_cr("Full thread dump %s (%s %s):", VM_Version::vm_name(), VM_Version::vm_release(), - VM_Version::vm_info_string()); + VM_Version::_s_vm_info_string); st->cr(); #if INCLUDE_SERVICES diff --git a/src/hotspot/share/runtime/vmStructs.cpp b/src/hotspot/share/runtime/vmStructs.cpp index c95dd709c84..854f46ff8df 100644 --- a/src/hotspot/share/runtime/vmStructs.cpp +++ b/src/hotspot/share/runtime/vmStructs.cpp @@ -702,6 +702,7 @@ \ static_field(Abstract_VM_Version, _s_vm_release, const char*) \ static_field(Abstract_VM_Version, _s_internal_vm_info_string, const char*) \ + static_field(Abstract_VM_Version, _s_vm_info_string, const char*) \ static_field(Abstract_VM_Version, _features, uint64_t) \ static_field(Abstract_VM_Version, _features_string, const char*) \ static_field(Abstract_VM_Version, _vm_major_version, int) \ diff --git a/src/hotspot/share/utilities/vmError.cpp b/src/hotspot/share/utilities/vmError.cpp index bbf1fcf9d6f..ba2116ae205 100644 --- a/src/hotspot/share/utilities/vmError.cpp +++ b/src/hotspot/share/utilities/vmError.cpp @@ -513,7 +513,7 @@ static void report_vm_version(outputStream* st, char* buf, int buflen) { (*vendor_version != '\0') ? " " : "", vendor_version, jdk_debug_level, VM_Version::vm_release(), - VM_Version::vm_info_string(), + VM_Version::_s_vm_info_string, TieredCompilation ? ", tiered" : "", #if INCLUDE_JVMCI EnableJVMCI ? ", jvmci" : "", ------------- PR Comment: https://git.openjdk.org/jdk/pull/24299#issuecomment-2769289318 From syan at openjdk.org Mon Apr 7 13:14:38 2025 From: syan at openjdk.org (SendaoYan) Date: Mon, 7 Apr 2025 13:14:38 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 02:39:10 GMT, David Holmes wrote: > I was also going to suggest caching the vm_info string as it should be the same all the time. I think you have discovered a bug in the way the info is being requested before arguments (like Xcomp) have been processed. That would cause the wrong info string to be recorded by the early callers. Thanks @dholmes-ora , I think I need some time to investigate this issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24299#issuecomment-2771217193 From gziemski at openjdk.org Mon Apr 7 13:16:57 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 7 Apr 2025 13:16:57 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v8] In-Reply-To: References: <5b6QutsBRX4KGZY50opOtRztczGFdilIy3CwlisJ4s4=.7b8a3f1a-4f6f-469f-829b-1d693ad1355d@github.com> Message-ID: <5_Vl9wYJivfVf33dr0MVpWcufWJ_JPvpmWFvVkzm6Ds=.6c5adc3c-dbac-4af5-9566-29cb14f7ee51@github.com> On Thu, 3 Apr 2025 11:16:43 GMT, Stefan Karlsson wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> The real feedback from StefanK > > src/hotspot/share/runtime/os.hpp line 527: > >> 525: >> 526: static char* map_memory(int fd, const char* file_name, size_t file_offset, >> 527: char *addr, size_t bytes, MemTag mem_tag, bool read_only = false, > > AFAICT, there's no need to have a default value for read_only. I think we should remove this default value and move the MemTag parameter so that it comes after read_only and before allow_exec. This would make the parameter order more consistent with the other functions that accept a mem_tag and an executable. > > Given that you have tested the current patch, I'm fine with doing this as a small follow-up patch. Filed [NMT: change map_memory() signature to match other functions](https://bugs.openjdk.org/browse/JDK-8353854) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24282#discussion_r2031217805 From gziemski at openjdk.org Mon Apr 7 13:30:45 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 7 Apr 2025 13:30:45 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v8] In-Reply-To: <5b6QutsBRX4KGZY50opOtRztczGFdilIy3CwlisJ4s4=.7b8a3f1a-4f6f-469f-829b-1d693ad1355d@github.com> References: <5b6QutsBRX4KGZY50opOtRztczGFdilIy3CwlisJ4s4=.7b8a3f1a-4f6f-469f-829b-1d693ad1355d@github.com> Message-ID: On Wed, 2 Apr 2025 18:37:36 GMT, Gerard Ziemski wrote: >> This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. >> >> I tried to fill in tag, when I was pretty certain that I had the right type. >> >> At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > The real feedback from StefanK Thank you Stefan for all your feedback and help! Very much appreciated! @afshin-zafari @jdksjolen I need one more review to move forward, could you please take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24282#issuecomment-2783327357 PR Comment: https://git.openjdk.org/jdk/pull/24282#issuecomment-2783334036 From gziemski at openjdk.org Mon Apr 7 13:30:44 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 7 Apr 2025 13:30:44 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v9] In-Reply-To: References: Message-ID: > This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. > > I tried to fill in tag, when I was pretty certain that I had the right type. > > At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: small last feedback from Stefan ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24282/files - new: https://git.openjdk.org/jdk/pull/24282/files/3bd03cbe..88525c57 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24282&range=07-08 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24282.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24282/head:pull/24282 PR: https://git.openjdk.org/jdk/pull/24282 From mbaesken at openjdk.org Mon Apr 7 14:11:02 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 7 Apr 2025 14:11:02 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v3] In-Reply-To: <9KQtTCyBbC24n4R_Oz-XO4_5ZZKXJU2hBYenYfg35xU=.263c3419-d326-4115-a128-999df4da632d@github.com> References: <9KQtTCyBbC24n4R_Oz-XO4_5ZZKXJU2hBYenYfg35xU=.263c3419-d326-4115-a128-999df4da632d@github.com> Message-ID: <13u1sQ-Mkoq1IqPJG2M_y8u5WvlEo_Ee1kc1rsy30Xk=.e8269d22-044a-4072-b475-abfe37b7dfd2@github.com> On Fri, 4 Apr 2025 01:50:30 GMT, David Holmes wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> use one time PeriodicTask > > src/hotspot/share/runtime/os.cpp line 1564: > >> 1562: size_t elements_read = fread(_image_release_file_content, 1, sz, file); >> 1563: if (elements_read < (size_t)sz) _image_release_file_content[elements_read] = '\0'; >> 1564: _image_release_file_content[sz] = '\0'; > > Shouldn't this be in an else? Yeah we probably (most likely) only need one '\0' . ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2031331768 From mli at openjdk.org Mon Apr 7 14:28:12 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 7 Apr 2025 14:28:12 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L Message-ID: Hi, Can you help to review this patch? On riscv, CMoveI/L already were implemented, but there are some gap: 1. CMoveI/L does not support comparison with float/double, corresponding tests are not turn on either. 2. Some optimization of C2 is not turned on, e.g. `Phi -> CMove -> min_max`. 3. lack of some corresponding performance tests. Also there are some issue with current Zicond: 1. UseZicond is turned on automatically by hwprobe, but jmh tests show that it's not always bring benefit, in some situation it even brings regression, the reason is the generated code by Zicond is much larger than branch version, in particular when it's in a loop and unrolled. This patch on riscv is to: 1. add CMoveI/L comparing float/double, and corresponding tests, 2. enable more C2 optimization, 3. add more benchmark tests, 4. turn off UseZicond by default. Thanks! ------------- Commit messages: - turn off flag Zicond by default - remove - initial commit Changes: https://git.openjdk.org/jdk/pull/24490/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24490&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352504 Stats: 951 lines in 15 files changed: 912 ins; 10 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/24490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24490/head:pull/24490 PR: https://git.openjdk.org/jdk/pull/24490 From mbaesken at openjdk.org Mon Apr 7 15:03:47 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 7 Apr 2025 15:03:47 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v4] In-Reply-To: References: Message-ID: > The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. > SOURCE=".:git:21af8c7e7405" > Also the MODULES list is probably useful to have. > Add this info (or the complete content of the release file) to the hs_err files. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: use stringStream in os.cpp, add a couple of changes suggested by David ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24244/files - new: https://git.openjdk.org/jdk/pull/24244/files/c33e11b2..3c1bc269 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24244&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24244&range=02-03 Stats: 42 lines in 3 files changed: 6 ins; 8 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/24244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24244/head:pull/24244 PR: https://git.openjdk.org/jdk/pull/24244 From mbaesken at openjdk.org Mon Apr 7 15:03:48 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 7 Apr 2025 15:03:48 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 15:47:27 GMT, Matthias Baesken wrote: >> The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. >> SOURCE=".:git:21af8c7e7405" >> Also the MODULES list is probably useful to have. >> Add this info (or the complete content of the release file) to the hs_err files. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use one time PeriodicTask Added some changes suggested by David. Switched to usage of stringStream in os.cpp, that simplifies the coding a bit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2783642565 From jnimeh at openjdk.org Mon Apr 7 15:11:36 2025 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Mon, 7 Apr 2025 15:11:36 GMT Subject: RFR: 8350126: Regression ~3% on Crypto-ChaCha20Poly1305.encrypt for MacOSX aarch64 [v2] In-Reply-To: References: Message-ID: > This fix addresses a performance regression found on some aarch64 processors, namely the Apple M1, when we moved to a quarter round parallel implementation in JDK-8349106. After making some improvements in the ordering of the instructions in the 20-round loop we found that going back to a block-parallel implementation was faster, but it definitely needed the ordering changes for that to be the case. More importantly, the block parallel implementation with the interleaving turns out to be faster on even those processors that showed improvements when moving to the quarter round parallel implementation. > > There is a spreadsheet attached to the JBS bug that shows 3 different implementations relative to the current (QR-parallel with no interleaving) implementation on 3 different ARM64 processors. Comparative benchmarks can also be found below. Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision: Place columnar/diagonal alignment code into separate method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24420/files - new: https://git.openjdk.org/jdk/pull/24420/files/b530e166..fe865308 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24420&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24420&range=00-01 Stats: 39 lines in 3 files changed: 33 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24420.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24420/head:pull/24420 PR: https://git.openjdk.org/jdk/pull/24420 From duke at openjdk.org Mon Apr 7 15:48:56 2025 From: duke at openjdk.org (David Linus Briemann) Date: Mon, 7 Apr 2025 15:48:56 GMT Subject: RFR: 8311227: Add .editorconfig so IDEs would pick up the common settings automatically: indent, trim trailing whitespace [v3] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 09:12:45 GMT, David Linus Briemann wrote: >> Add an .editorconfig to define indentation, trim trailing whitespace and open curly brace position for C++ and Java. >> This allows various editors to easily infer basics of the coding style. > > David Linus Briemann has updated the pull request incrementally with one additional commit since the last revision: > > make editorconfig hotspot specific I see. So you were addressing the file types specifically. I thought you had an issue with the indentation. So since you already included the indentation in the other PR I will just approve yours and close mine. Thanks for the explanation of jcheck. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23693#issuecomment-2783820088 From duke at openjdk.org Mon Apr 7 15:52:06 2025 From: duke at openjdk.org (David Linus Briemann) Date: Mon, 7 Apr 2025 15:52:06 GMT Subject: RFR: 8311227: Add .editorconfig so IDEs would pick up the common settings automatically: indent, trim trailing whitespace [v3] In-Reply-To: References: Message-ID: <9-DIvV5xUzcK1WvnpS-euxm4O2BoF7iN8omSPQ2vcuk=.79e44045-cbbf-43a8-a664-a01aeb92dc47@github.com> On Tue, 11 Mar 2025 09:12:45 GMT, David Linus Briemann wrote: >> Add an .editorconfig to define indentation, trim trailing whitespace and open curly brace position for C++ and Java. >> This allows various editors to easily infer basics of the coding style. > > David Linus Briemann has updated the pull request incrementally with one additional commit since the last revision: > > make editorconfig hotspot specific Closing in favor of #24448 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23693#issuecomment-2783827651 From duke at openjdk.org Mon Apr 7 15:52:07 2025 From: duke at openjdk.org (David Linus Briemann) Date: Mon, 7 Apr 2025 15:52:07 GMT Subject: Withdrawn: 8311227: Add .editorconfig so IDEs would pick up the common settings automatically: indent, trim trailing whitespace In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 12:27:58 GMT, David Linus Briemann wrote: > Add an .editorconfig to define indentation, trim trailing whitespace and open curly brace position for C++ and Java. > This allows various editors to easily infer basics of the coding style. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23693 From kvn at openjdk.org Mon Apr 7 17:05:59 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 7 Apr 2025 17:05:59 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 22:52:24 GMT, Vladimir Ivanov wrote: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Few questions? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/CPUFeatures.java line 60: > 58: } > 59: > 60: public static class X64 { Should we create `src/jdk.incubator.vector/cpu/` for CPU specific information? As separate refactoring. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 100: > 98: > 99: /** > 100: * Naming convention in SVML vector math library. Does this library has code for all AVX configurations? ------------- PR Review: https://git.openjdk.org/jdk/pull/24462#pullrequestreview-2747510895 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2031654383 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2031657213 From liach at openjdk.org Mon Apr 7 17:46:51 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 7 Apr 2025 17:46:51 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 22:52:24 GMT, Vladimir Ivanov wrote: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! src/jdk.incubator.vector/share/classes/jdk/incubator/vector/CPUFeatures.java line 44: > 42: String featuresString = VectorSupport.getCPUFeatures(); > 43: debug(featuresString); > 44: String[] features = featuresString.toLowerCase().split(", "); // ", " is used as a delimiter Please use `toLowerCase(Locale.ROOT)`: if the system locale is turkish, `I` and dotless i are two letters, and the dotless i will fail in the subsequent `validateFeatures` assertion. Same for `hasFeature`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2031714743 From ihse at openjdk.org Mon Apr 7 20:33:29 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 7 Apr 2025 20:33:29 GMT Subject: RFR: 8311227: Add .editorconfig so IDEs would pick up the common settings automatically: indent, trim trailing whitespace [v3] In-Reply-To: References: Message-ID: <3FJ4nDiYuCkXnv6IlfV-5__atOBvnk5GyXjE3Wu3oe0=.f67e0cad-f237-49ce-9756-a0cf890e1c47@github.com> On Mon, 7 Apr 2025 15:46:12 GMT, David Linus Briemann wrote: > I thought you had an issue with the indentation. I had an issue with indentation in the original PR, where you wanted to apply it to all files and not just Hotspot, but not the current version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23693#issuecomment-2784556005 From mdoerr at openjdk.org Mon Apr 7 21:16:32 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 7 Apr 2025 21:16:32 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v29] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Wed, 19 Mar 2025 08:26:55 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request incrementally with three additional commits since the last revision: > > - comments > - comments > - comments src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 640: > 638: __ vsldoi(vTmp8, vZero, vReducedLow, 1); // 0x1 > 639: __ vor(vTmp8, vConstC2, vTmp8); // 0xC2...1 > 640: __ vsplt(vTmp9, 0, vH); // MSB of H I think the instruction name should be vspltb, but we should better clean this up in a separate RFE. Seems like the immediate value should be the last argument. Maybe we can find more to clean up and document that in a new RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r2032016756 From vlivanov at openjdk.org Mon Apr 7 21:37:33 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Apr 2025 21:37:33 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v2] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Reviews and Float64Vector-related fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/fc27aee5..368b943e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=00-01 Stats: 22 lines in 2 files changed: 5 ins; 6 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From mdoerr at openjdk.org Mon Apr 7 21:43:18 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 7 Apr 2025 21:43:18 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v29] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Wed, 19 Mar 2025 08:26:55 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request incrementally with three additional commits since the last revision: > > - comments > - comments > - comments It looks correct to me. Another reviewer may find more improvement proposals. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20235#pullrequestreview-2748174110 From vlivanov at openjdk.org Mon Apr 7 23:03:03 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Apr 2025 23:03:03 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v3] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: features_string -> cpu_info_string ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/368b943e..9a8f6200 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=01-02 Stats: 26 lines in 8 files changed: 1 ins; 0 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From vlivanov at openjdk.org Mon Apr 7 23:25:26 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Apr 2025 23:25:26 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v3] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 17:44:33 GMT, Chen Liang wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> features_string -> cpu_info_string > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/CPUFeatures.java line 44: > >> 42: String featuresString = VectorSupport.getCPUFeatures(); >> 43: debug(featuresString); >> 44: String[] features = featuresString.toLowerCase().split(", "); // ", " is used as a delimiter > > Please use `toLowerCase(Locale.ROOT)`: if the system locale is turkish, `I` and dotless i are two letters, and the dotless i will fail in the subsequent `validateFeatures` assertion. Same for `hasFeature`. Good point. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2032135321 From vlivanov at openjdk.org Mon Apr 7 23:25:27 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Apr 2025 23:25:27 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v3] In-Reply-To: <9BCE8xN6SA-cPEc1EtuSsqoYwsHiwp31lJKsraWgYso=.67a97434-ef3c-40ab-b5be-841889fdd97c@github.com> References: <9BCE8xN6SA-cPEc1EtuSsqoYwsHiwp31lJKsraWgYso=.67a97434-ef3c-40ab-b5be-841889fdd97c@github.com> Message-ID: On Mon, 7 Apr 2025 06:44:16 GMT, Per Minborg wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> features_string -> cpu_info_string > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 258: > >> 256: if (LIBRARY.isSupported(op, vspecies)) { >> 257: String symbol = LIBRARY.symbolName(op, vspecies); >> 258: MemorySegment addr = LOOKUP.find(symbol) > > It is better to use `LOOKUP.findOrThrow()` because it does not require lambda creation. Thanks, changed as you suggested. I introduced a try-catch block instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2032138430 From vlivanov at openjdk.org Mon Apr 7 23:25:27 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Apr 2025 23:25:27 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v3] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 17:01:19 GMT, Vladimir Kozlov wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> features_string -> cpu_info_string > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/CPUFeatures.java line 60: > >> 58: } >> 59: >> 60: public static class X64 { > > Should we create `src/jdk.incubator.vector/cpu/` for CPU specific information? As separate refactoring. To clarify: are you suggesting to move platform-specific classes into a separate package or platform-specific location? It does make sense to separate platform-specific parts into their own classes once amount of code grows over some limit. For now it doesn't look too attractive since amount of code is very small. > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 100: > >> 98: >> 99: /** >> 100: * Naming convention in SVML vector math library. > > Does this library has code for all AVX configurations? Yes, there are 4 configurations (`-XX:UseAVX=[0..3]`) in total covered by SVML library. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2032132478 PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2032134903 From vlivanov at openjdk.org Mon Apr 7 23:32:05 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Apr 2025 23:32:05 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v4] In-Reply-To: References: Message-ID: > Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. > > Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. > > The patch consists of the following parts: > * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; > * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); > * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. > > `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. > > Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. > > Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). > > Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) > > Thanks! Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Fix windows-aarch64 build failure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24462/files - new: https://git.openjdk.org/jdk/pull/24462/files/9a8f6200..bb1a11db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24462&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24462.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24462/head:pull/24462 PR: https://git.openjdk.org/jdk/pull/24462 From vlivanov at openjdk.org Mon Apr 7 23:32:06 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 7 Apr 2025 23:32:06 GMT Subject: RFR: 8353786: Migrate Vector API math library support to FFM API [v3] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 23:03:03 GMT, Vladimir Ivanov wrote: >> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API. >> >> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side. >> >> The patch consists of the following parts: >> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup; >> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations); >> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching. >> >> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code. >> >> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away. >> >> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64). >> >> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64) >> >> Thanks! > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > features_string -> cpu_info_string In addition to addressing review feedback, there are 2 updates: * SVML: I overlooked that 64-bit vectors are covered by original implementation; fixed now; * JVM: `features_string` to `cpu_info_string` renaming uniformly across all platforms ------------- PR Comment: https://git.openjdk.org/jdk/pull/24462#issuecomment-2784850623 From dholmes at openjdk.org Mon Apr 7 23:39:17 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 7 Apr 2025 23:39:17 GMT Subject: RFR: 8353365: TOUCH_ASSERT_POISON clears GetLastError() [v2] In-Reply-To: References: Message-ID: On Sun, 6 Apr 2025 22:50:28 GMT, David Holmes wrote: >> This is a very simple fix to save/restore the "last error" value on Windows, so that the TOUCH_ASSERT_POISON mechanism used in assert/guarantee/fatal, does not clear it. >> >> Testing >> - new Windows-only gtest added to vmErrors test group >> - tiers 103 sanity >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Adjust format specifier and remove cast Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24435#issuecomment-2784862100 From dholmes at openjdk.org Mon Apr 7 23:39:18 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 7 Apr 2025 23:39:18 GMT Subject: Integrated: 8353365: TOUCH_ASSERT_POISON clears GetLastError() In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 05:43:36 GMT, David Holmes wrote: > This is a very simple fix to save/restore the "last error" value on Windows, so that the TOUCH_ASSERT_POISON mechanism used in assert/guarantee/fatal, does not clear it. > > Testing > - new Windows-only gtest added to vmErrors test group > - tiers 103 sanity > > Thanks. This pull request has now been integrated. Changeset: 3951a8e0 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/3951a8e01945d262cdd6ebbe4e1548ddf8e3c02a Stats: 12 lines in 2 files changed: 12 ins; 0 del; 0 mod 8353365: TOUCH_ASSERT_POISON clears GetLastError() Reviewed-by: kbarrett, stuefe, jwaters ------------- PR: https://git.openjdk.org/jdk/pull/24435 From dholmes at openjdk.org Mon Apr 7 23:42:16 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 7 Apr 2025 23:42:16 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v3] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 10:44:02 GMT, Kevin Walls wrote: >> We should be consistent, and run all OnError items in a new shell. Currently the ; separator causes a new shell, but multiple -XX:OnError= options are grouped into the same shell. >> >> next_OnError_command() decides on where a new command starts. It should recognise newlines, and all commands will get their own shell. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > comment Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24354#pullrequestreview-2748329470 From sviswanathan at openjdk.org Tue Apr 8 00:12:15 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 8 Apr 2025 00:12:15 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v13] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:38:34 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Reacting to comment by Sandhya. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 802: > 800: __ evpbroadcastd(zero, scratch, Assembler::AVX_512bit); // 0 > 801: __ addl(scratch, 1); > 802: __ evpbroadcastd(one, scratch, Assembler::AVX_512bit); // 1 A better way to initialize (0, 1, -1) vectors is: // load 0 into int vector vpxor(zero, zero, zero, Assembler::AVX_512bit); // load -1 into int vector vpternlogd(minusOne, 0xff, minusOne, minusOne, Assembler::AVX_512bit); // load 1 into int vector vpsubd(one, zero, minusOne, Assembler::AVX_512bit); Where minusOne could be xmm31. A broadcast from r register to xmm register is more expensive. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 982: > 980: __ evporq(xmm19, k0, xmm19, xmm23, false, Assembler::AVX_512bit); > 981: > 982: __ evpsubd(xmm12, k0, zero, one, false, Assembler::AVX_512bit); // -1 The -1 initialization could be done outside the loop. src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1015: > 1013: __ addptr(lowPart, 4 * XMMBYTES); > 1014: __ cmpl(len, 0); > 1015: __ jcc(Assembler::notEqual, L_loop); It looks to me that subl and cmpl could be merged: __ addptr(highPart, 4 * XMMBYTES); __ addptr(lowPart, 4 * XMMBYTES); __ subl(len, 4 * XMMBYTES); __ jcc(Assembler::notEqual, L_loop); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2032172061 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2032171059 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2031979828 From jiangli at openjdk.org Tue Apr 8 03:01:29 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 8 Apr 2025 03:01:29 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 [v2] In-Reply-To: <8py7v6gQd9nfucRK28h2enHWk21PevhMa74FYx3bRsI=.663fa24c-4218-426b-bda4-2e8d6ecb02aa@github.com> References: <8py7v6gQd9nfucRK28h2enHWk21PevhMa74FYx3bRsI=.663fa24c-4218-426b-bda4-2e8d6ecb02aa@github.com> Message-ID: On Mon, 7 Apr 2025 13:14:37 GMT, SendaoYan wrote: >> Hi all, >> >> This PR will try to fix memory leak after JDK-8352184. which re-do [JDK-8352184](https://bugs.openjdk.org/browse/JDK-8352184) using the original, purely static uses of the various description strings, as @jianglizhou had proposed. >> >> Additional testing: >> >> - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64 >> - [x] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 >> - [x] full `java -version` tests, the test shell script show as below. >> >> [JDK-8353189.sh.txt](https://github.com/user-attachments/files/19632637/JDK-8353189.sh.txt) > > SendaoYan has updated the pull request incrementally with two additional commits since the last revision: > > - Use static static string instead of assemble string dynamic > - Revert "8353189: [ASAN] memory leak after 8352184" > > This reverts commit 71bc3ad34ebd57cc6642dfede18cec65e3694dd1. src/hotspot/share/runtime/abstract_vm_version.cpp line 140: > 138: > 139: > 140: const char* Abstract_VM_Version::vm_info_string() { For future benefit, how about also adding a comment explain why we avoid dynamic memory allocation for the vm_info_string here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24299#discussion_r2032299122 From sspitsyn at openjdk.org Tue Apr 8 03:16:30 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 8 Apr 2025 03:16:30 GMT Subject: RFR: 8316682: serviceability/jvmti/vthread/SelfSuspendDisablerTest timed out [v3] In-Reply-To: References: Message-ID: > This fixes the issue with lack of synchronization between JVMTI thread suspend and resume functions in a self-suspend case. More detailed fix description is in the first PR comment. > > Testing: Ran mach5 tiers 1-6. Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge - some cleanup - 8316682: serviceability/jvmti/vthread/SelfSuspendDisablerTest timed out ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24269/files - new: https://git.openjdk.org/jdk/pull/24269/files/18944347..4a92986a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24269&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24269&range=01-02 Stats: 68856 lines in 1040 files changed: 24634 ins; 41255 del; 2967 mod Patch: https://git.openjdk.org/jdk/pull/24269.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24269/head:pull/24269 PR: https://git.openjdk.org/jdk/pull/24269 From syan at openjdk.org Tue Apr 8 03:55:03 2025 From: syan at openjdk.org (SendaoYan) Date: Tue, 8 Apr 2025 03:55:03 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 [v3] In-Reply-To: References: Message-ID: <9k911IJe4DJG2PKWmXuaAY5WJYBuwFxyPfzd422V5FU=.72584eaf-bdf7-4549-9d73-808ab6e96466@github.com> > Hi all, > > This PR will try to fix memory leak after JDK-8352184. which re-do [JDK-8352184](https://bugs.openjdk.org/browse/JDK-8352184) using the original, purely static uses of the various description strings, as @jianglizhou had proposed. > > Additional testing: > > - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64 > - [x] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 > - [x] full `java -version` tests, the test shell script show as below. > > [JDK-8353189.sh.txt](https://github.com/user-attachments/files/19632637/JDK-8353189.sh.txt) SendaoYan has updated the pull request incrementally with one additional commit since the last revision: add a comment to explain why we avoid dynamic memory allocation for the vm_info_string ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24299/files - new: https://git.openjdk.org/jdk/pull/24299/files/9d039fd5..461b0f84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24299&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24299&range=01-02 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24299.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24299/head:pull/24299 PR: https://git.openjdk.org/jdk/pull/24299 From syan at openjdk.org Tue Apr 8 03:55:05 2025 From: syan at openjdk.org (SendaoYan) Date: Tue, 8 Apr 2025 03:55:05 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 [v2] In-Reply-To: References: <8py7v6gQd9nfucRK28h2enHWk21PevhMa74FYx3bRsI=.663fa24c-4218-426b-bda4-2e8d6ecb02aa@github.com> Message-ID: On Tue, 8 Apr 2025 02:58:49 GMT, Jiangli Zhou wrote: >> SendaoYan has updated the pull request incrementally with two additional commits since the last revision: >> >> - Use static static string instead of assemble string dynamic >> - Revert "8353189: [ASAN] memory leak after 8352184" >> >> This reverts commit 71bc3ad34ebd57cc6642dfede18cec65e3694dd1. > > src/hotspot/share/runtime/abstract_vm_version.cpp line 140: > >> 138: >> 139: >> 140: const char* Abstract_VM_Version::vm_info_string() { > > For future benefit, how about also adding a comment explain why we avoid dynamic memory allocation for the vm_info_string here? I have add a comment, but my English is not very well. If you have proper description, please let me known. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24299#discussion_r2032332152 From stuefe at openjdk.org Tue Apr 8 05:46:09 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 8 Apr 2025 05:46:09 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances [v3] In-Reply-To: References: Message-ID: <8oHlvz98KPGFmzSttjpmKbjNSQkbF0rmLlE9Aqdgs1M=.4b96d172-54c6-4005-bcd2-1a5ef76198a0@github.com> > In preparation for planned GC performance improvements (KLUT), I would like to reduce the average number of oop map entries. > > For details, please see JBS issue text. > > ----------------------- > > Patch results: > > The patch brings a positive change of oop map size, reducing the likelihood of lengthy oop maps. Here the oop map size distribution over all JDK classes in the JDK image: > > Before: > > 5395 - non-static oop maps (0 entries) > 9330 - non-static oop maps (1 entries) > 1449 - non-static oop maps (2 entries) > 274 - non-static oop maps (3 entries) > 218 - non-static oop maps (4 entries) > 75 - non-static oop maps (5 entries) > 7 - non-static oop maps (6 entries) > 4 - non-static oop maps (7 entries) > > > Now: > > 5395 - non-static oop maps (0 entries) > 10178 - non-static oop maps (1 entries) > 933 - non-static oop maps (2 entries) > 229 - non-static oop maps (3 entries) > 16 - non-static oop maps (4 entries) > 1 - non-static oop maps (5 entries) > > > For example, `java.util.concurrent.ConcurrentHashMap$TreeNode` is changed from having 2 entries to having just one entry, which is nice for a class that may be instantiated a lot: > > Before: > > java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000000d1dddc0} > - ---- non-static fields (9 words): > - final 'hash' 'I' @12 > - final 'key' 'Ljava/lang/Object;' @16 > - volatile 'val' 'Ljava/lang/Object;' @20 > - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class > - 'red' 'Z' @28 << derived class starts here, non-oops lead > - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 > - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 > - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 > - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @44 > - non-static oop maps (2 entries): 16-24 32-44 > > Now: > > java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000007e1de450} > - ---- non-static fields (9 words): > - final 'hash' 'I' @12 > - final 'key' 'Ljava/lang/Object;' @16 > - volatile 'val' 'Ljava/lang/Object;' @20 > - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class > - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @28 << class starts here, oops lead > - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 > - 'right' 'Ljava/util/concurrent/Concurre... Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - test fixes - Merge branch 'master' into JDK-8353273-Reduce-average-number-of-oop-map-entries-in-instance-objects - add regression test - Reworked to use prior super klass layout reconstruction pass - Merge branch 'master' into JDK-8353273-Reduce-average-number-of-oop-map-entries-in-instance-objects - alternate-order - print ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24330/files - new: https://git.openjdk.org/jdk/pull/24330/files/dfbe4859..94c55bde Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24330&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24330&range=01-02 Stats: 18630 lines in 622 files changed: 12823 ins; 4437 del; 1370 mod Patch: https://git.openjdk.org/jdk/pull/24330.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24330/head:pull/24330 PR: https://git.openjdk.org/jdk/pull/24330 From stuefe at openjdk.org Tue Apr 8 05:46:11 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 8 Apr 2025 05:46:11 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances [v2] In-Reply-To: <1n0KWxyuHafZFtkM1ByFFpUqWTkeAOVWcRuBv21AU5g=.f4eb6906-8d3b-46e2-b973-cda8a9d7e110@github.com> References: <_m74U7bVGssSQnbrkP-4KvS5nga3bg4Bh4g5BlU07Kw=.f951a8cf-a4a6-452f-936e-284decfd9df8@github.com> <1n0KWxyuHafZFtkM1ByFFpUqWTkeAOVWcRuBv21AU5g=.f4eb6906-8d3b-46e2-b973-cda8a9d7e110@github.com> Message-ID: On Thu, 3 Apr 2025 22:45:27 GMT, Leonid Mesnik wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - add regression test >> - Reworked to use prior super klass layout reconstruction pass >> - Merge branch 'master' into JDK-8353273-Reduce-average-number-of-oop-map-entries-in-instance-objects >> - alternate-order >> - print > > test/hotspot/jtreg/runtime/FieldLayout/TestOopMapSizeMinimal.java line 94: > >> 92: static { >> 93: WhiteBox WB = WhiteBox.getWhiteBox(); >> 94: boolean is_64_bit = System.getProperty("sun.arch.data.model").equals("64"); > > I am a little bit confused with this check > and > `*` @requires vm.bits == "64"` > Shouldn't "sun.arch.data.model" be always 64? Thank you. Copy-paste error. I fixed the tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24330#discussion_r2032424032 From iklam at openjdk.org Tue Apr 8 06:38:32 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 8 Apr 2025 06:38:32 GMT Subject: RFR: 8351319: AOT cache support for custom class loaders broken since JDK-8348426 [v4] In-Reply-To: References: Message-ID: > Since [JDK-8348426](https://bugs.openjdk.org/browse/JDK-8348426) (Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file), the AOT cache no longer contains classes intended for custom class loaders (these are called "unregistered classes" in CDS terminology). > > The fix is simple -- we already remember the set of unregistered classes in the AOT configuration file. We just need to add them into the final AOT cache (see changes in finalImageRecipes.cpp). Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Fixed (1) size/crc was not set so the SimpleCusty class was not loaded from cache; (2) cp->resolved_reference_length() was not set correctly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23926/files - new: https://git.openjdk.org/jdk/pull/23926/files/ccea0a41..c02a611e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23926&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23926&range=02-03 Stats: 81 lines in 6 files changed: 70 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/23926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23926/head:pull/23926 PR: https://git.openjdk.org/jdk/pull/23926 From iklam at openjdk.org Tue Apr 8 06:38:33 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 8 Apr 2025 06:38:33 GMT Subject: RFR: 8351319: AOT cache support for custom class loaders broken since JDK-8348426 [v3] In-Reply-To: References: Message-ID: <3gX4VVjjVs9dkE3UWkDxP56ZFvBRa2GjHhCSrCEiyew=.37cff581-edc5-4081-871c-58a59989cf52@github.com> On Fri, 4 Apr 2025 22:04:32 GMT, John R Rose wrote: > Good. > > I suppose there already tests for the other end of the process, where an unregistered class in the AOT cache is actually used. What are those tests? It turns out that the existing tests only check for unregistered classes in the classical CDS archive. I added a test in BulkLoaderTest.java to check the AOT cache and found a bug. Fixed in [c02a611](https://github.com/openjdk/jdk/pull/23926/commits/c02a611ecca7da318865d3d98e4cae7fb1eb8410) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23926#issuecomment-2785377507 From iklam at openjdk.org Tue Apr 8 06:45:06 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 8 Apr 2025 06:45:06 GMT Subject: RFR: 8351319: AOT cache support for custom class loaders broken since JDK-8348426 [v5] In-Reply-To: References: Message-ID: > Since [JDK-8348426](https://bugs.openjdk.org/browse/JDK-8348426) (Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file), the AOT cache no longer contains classes intended for custom class loaders (these are called "unregistered classes" in CDS terminology). > > The fix is simple -- we already remember the set of unregistered classes in the AOT configuration file. We just need to add them into the final AOT cache (see changes in finalImageRecipes.cpp). Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into 8351319-support-for-custom-loaders-missing-since-jdk-8348426 - Fixed (1) size/crc was not set so the SimpleCusty class was not loaded from cache; (2) cp->resolved_reference_length() was not set correctly - Avoid duplicated unregistered classes that have the same name - Merge branch 'master' into 8351319-support-for-custom-loaders-missing-since-jdk-8348426 - Merge branch 'master' into 8351319-support-for-custom-loaders-missing-since-jdk-8348426 - 8351319: AOT cache support for custom class loaders broken since JDK-8348426 ------------- Changes: https://git.openjdk.org/jdk/pull/23926/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23926&range=04 Stats: 183 lines in 13 files changed: 151 ins; 0 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/23926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23926/head:pull/23926 PR: https://git.openjdk.org/jdk/pull/23926 From fyang at openjdk.org Tue Apr 8 07:15:20 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 8 Apr 2025 07:15:20 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 14:23:52 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > On riscv, CMoveI/L already were implemented, but there are some gap: > 1. CMoveI/L does not support comparison with float/double, corresponding tests are not turn on either. > 2. Some optimization of C2 is not turned on, e.g. `Phi -> CMove -> min_max`. > 3. lack of some corresponding performance tests. > > Also there are some issue with current Zicond: > 1. UseZicond is turned on automatically by hwprobe, but jmh tests show that it's not always bring benefit, in some situation it even brings regression, the reason is the generated code by Zicond is much larger than branch version, in particular when it's in a loop and unrolled. > > This patch on riscv is to: > 1. add CMoveI/L comparing float/double, and corresponding tests, > 2. enable more C2 optimization, > 3. add more benchmark tests, > 4. turn off UseZicond by default. > > Thanks! Hi, @Hamlin-Li , Thanks for looking at this part. I once created JBS https://bugs.openjdk.org/browse/JDK-8346786 about `ConditionalMoveLimit` on RISC-V. I have two questions after a cursory look. src/hotspot/cpu/riscv/vm_version_riscv.cpp line 257: > 255: if (ConditionalMoveLimit > 0) { > 256: FLAG_SET_DEFAULT(ConditionalMoveLimit, 0); > 257: } Maybe we should check `UseZicond` and only enable `UseCMoveUnconditionally` & `ConditionalMoveLimit` conditionally? I don't see how enabling CMove will bring us performance benefit without `Zicond`. It's done with conditional branches in CPU backend. src/hotspot/cpu/riscv/vm_version_riscv.cpp line 461: > 459: FLAG_SET_DEFAULT(UseZicond, false); > 460: warning("UseZicond is turned off automatically. Turn it on with -XX:+UseZicond explicitly."); > 461: } Does this mean `UseZicond` could not be enabled on the command line? And I witnessed quite some warning when doing a native build. If `UseZicond` causes regression for some cases, is it more reasonable to not auto-enable it through hwprobe [1]? Or only enable it for debug builds like https://github.com/openjdk/jdk/pull/24478 does? [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp#L228 ------------- PR Review: https://git.openjdk.org/jdk/pull/24490#pullrequestreview-2748525865 PR Review Comment: https://git.openjdk.org/jdk/pull/24490#discussion_r2032530242 PR Review Comment: https://git.openjdk.org/jdk/pull/24490#discussion_r2032292830 From dholmes at openjdk.org Tue Apr 8 08:03:25 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 8 Apr 2025 08:03:25 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v4] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 07:58:02 GMT, David Holmes wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> use stringStream in os.cpp, add a couple of changes suggested by David > > src/hotspot/share/runtime/os.cpp line 1538: > >> 1536: ss.print("%s/release", home); >> 1537: >> 1538: if (_image_release_file_content == nullptr) { > > Shouldn't we check before doing anything else i.e. make line 1538 first. > > Also for race concerns use load_acquire on `image_release_file_content`. Though actually shouldn't this just be an assert as we only expect this to be called once. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2032626089 From dholmes at openjdk.org Tue Apr 8 08:03:24 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 8 Apr 2025 08:03:24 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v4] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 15:03:47 GMT, Matthias Baesken wrote: >> The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. >> SOURCE=".:git:21af8c7e7405" >> Also the MODULES list is probably useful to have. >> Add this info (or the complete content of the release file) to the hs_err files. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use stringStream in os.cpp, add a couple of changes suggested by David Changes requested by dholmes (Reviewer). src/hotspot/share/runtime/os.cpp line 1538: > 1536: ss.print("%s/release", home); > 1537: > 1538: if (_image_release_file_content == nullptr) { Shouldn't we check before doing anything else i.e. make line 1538 first. Also for race concerns use load_acquire on `image_release_file_content`. src/hotspot/share/runtime/os.cpp line 1550: > 1548: fseek(file, 0, SEEK_SET); > 1549: > 1550: _image_release_file_content = (char*) os::malloc(sz + 1, mtInternal); For race concerns do everything to a tmp variable and finally do a release_store into `_image_release_file_content`. src/hotspot/share/runtime/os.cpp line 1573: > 1571: > 1572: void os::print_image_release_file(outputStream* st) { > 1573: if (_image_release_file_content != nullptr) { Use load_acquire. ------------- PR Review: https://git.openjdk.org/jdk/pull/24244#pullrequestreview-2749042138 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2032622103 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2032623469 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2032624359 From thartmann at openjdk.org Tue Apr 8 08:05:29 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 8 Apr 2025 08:05:29 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v6] In-Reply-To: References: Message-ID: On Sat, 5 Apr 2025 14:29:28 GMT, Zihao Lin wrote: >> This patch remove slice parameter from LoadNode::make >> >> Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 >> >> Hi team, I am new, I'd appreciate any guidance. Thank a lot! > > Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into 8344116 > - Fix build > - Fix test failed > - 8344116: C2: remove slice parameter from LoadNode::make I think @rwestrel should have a look at this, since he suggested the cleanup in https://github.com/openjdk/jdk/pull/21834. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24258#issuecomment-2785582950 From rehn at openjdk.org Tue Apr 8 09:44:23 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 8 Apr 2025 09:44:23 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user [v3] In-Reply-To: <5sujqD7L_cmLUyDwYb4PhgOlEeiFwlkAV7RJoVMFTrM=.223437cd-bbb2-4ef3-a6fe-b13ce402e14b@github.com> References: <5sujqD7L_cmLUyDwYb4PhgOlEeiFwlkAV7RJoVMFTrM=.223437cd-bbb2-4ef3-a6fe-b13ce402e14b@github.com> Message-ID: On Mon, 31 Mar 2025 10:45:54 GMT, Robbin Ehn wrote: >> Hi, for you to consider. >> >> These tests constantly fails in qemu-user. >> Either the require host to be same arch explicit or implicit (sysroot). >> E.g. "ptrace(PTRACE_ATTACH, ..) failed for 405157: Function not implemented'" for SA tests. >> >> From bug: >>> qemu-user/rv64 sets uarch to "qemu" in /proc/cpuinfo (qemu-system do not do that). >>> We add this uarch to CPU feature string. >>> This means we can use jtreg 'require' with cpu string to filter out tests in qemu-user. >> >> Relevant qemu code: >> https://github.com/qemu/qemu/blob/170825d14d88a1ce7fae98d5a928480f2f329b22/linux-user/riscv/target_proc.h#L29 >> >> Relevant hotspot code: >> https://github.com/openjdk/jdk/blob/fa0b18bfde38ee2ffbab33a9eaac547fe8aa3c7c/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp#L250 >> >> Tested that the require only filters out tests in qemu+riscv64. >> >> Thanks! >> >> /Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into qemu-user-issues > - Revert > - Merge branch 'master' into qemu-user-issues > - Merge branch 'master' into qemu-user-issues > - more > - more > - native or very long Any takers? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24229#issuecomment-2785851072 From stuefe at openjdk.org Tue Apr 8 09:57:16 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 8 Apr 2025 09:57:16 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v3] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 10:44:02 GMT, Kevin Walls wrote: >> We should be consistent, and run all OnError items in a new shell. Currently the ; separator causes a new shell, but multiple -XX:OnError= options are grouped into the same shell. >> >> next_OnError_command() decides on where a new command starts. It should recognise newlines, and all commands will get their own shell. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > comment Hmm. May there not be customers that specify a verbatim "spelled out" shell script as input for OnError? This is a behavior change. And if not (if the commands issued with OnError, separated by ;, are in turn real commands, programs or scripts): Those, before, were forked off as separate grandchilds to the same parent (the direct child), right? Whereas now we have a single parent for each grandchild process. But here, especially if OnError had been called as reaction to an OOM condition by a gigantic JVM, reusing that in-between shell may be preferable. Forking off a large process can be expensive. (Obviously, its all undocumented, which is bad in itself). All of these are questions - I may not know the full story. src/hotspot/share/utilities/vmError.cpp line 156: > 154: > 155: const char * cmdend = cmd; > 156: while (*cmdend != '\0' && *cmdend != ';' && *cmdend != '\n') cmdend++; What happens if I specify, eg in Bash: -XX:OnError=lengthy command \ more command options ? ------------- PR Review: https://git.openjdk.org/jdk/pull/24354#pullrequestreview-2749347538 PR Review Comment: https://git.openjdk.org/jdk/pull/24354#discussion_r2032811356 From mli at openjdk.org Tue Apr 8 10:32:20 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 8 Apr 2025 10:32:20 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 07:07:12 GMT, Fei Yang wrote: > Maybe we should check UseZicond and only enable UseCMoveUnconditionally & ConditionalMoveLimit conditionally? Not sure what do you mean here. > I don't see how enabling CMove will bring us any performance benefit without Zicond. It's done with conditional branches in CPU backend as well. I add the performance result in desc. There are 2 optimization scenarios, one is cmove itself, another is when cmove can be transform to a min/max in some condition. > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 461: > >> 459: FLAG_SET_DEFAULT(UseZicond, false); >> 460: warning("UseZicond is turned off automatically. Turn it on with -XX:+UseZicond explicitly."); >> 461: } > > Does this mean `UseZicond` could not be enabled on the command line? And I witnessed quite some warning when doing a native build. If `UseZicond` causes regression for some cases, is it more reasonable to not auto-enable it through hwprobe [1]? Or only enable it for debug builds like https://github.com/openjdk/jdk/pull/24478 does? > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp#L228 This is to not enable Zicond automatically, but user can still turn it on manually if they want to try or make sure it bring benefit on the specific hardware. Currently it's based on bananapi result, so maybe in the future we should adjust the default value of UseZicond. I'm fine with either default value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24490#discussion_r2032893506 PR Review Comment: https://git.openjdk.org/jdk/pull/24490#discussion_r2032893370 From shade at openjdk.org Tue Apr 8 10:58:31 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 8 Apr 2025 10:58:31 GMT Subject: RFR: 8353174: Clean up thread register handling after 32-bit x86 removal Message-ID: <1u8EcLMPtsHFmCPJMQ-Hgcfql0MWy88TpSDZjr9AqrQ=.79b04f21-d97a-4af9-b83b-533ebb1acb6c@github.com> Various `MacroAssembler` methods have this code to support passing the thread register, and getting it if `noreg` is passed: // determine java_thread register if (!java_thread->is_valid()) { #ifdef _LP64 java_thread = r15_thread; #else java_thread = rdi; get_thread(java_thread); #endif // LP64 } This never happens after 32-bit x86 removal. x86_64 always uses r15_thread. We can clean those up. These are also the only major users of `MacroAssembler::get_thread` that we want to remove/rename to avoid falling into traps like [JDK-8353176](https://bugs.openjdk.org/browse/JDK-8353176). Additional testing: - [x] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/24323/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24323&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353174 Stats: 233 lines in 15 files changed: 16 ins; 98 del; 119 mod Patch: https://git.openjdk.org/jdk/pull/24323.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24323/head:pull/24323 PR: https://git.openjdk.org/jdk/pull/24323 From kevinw at openjdk.org Tue Apr 8 12:11:18 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Tue, 8 Apr 2025 12:11:18 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v3] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 09:54:21 GMT, Thomas Stuefe wrote: > Hmm. > > May there not be customers that specify a verbatim "spelled out" shell script as input for OnError? This is a behavior change. > > And if not (if the commands issued with OnError, separated by ;, are in turn real commands, programs or scripts): Those, before, were forked off as separate grandchilds to the same parent (the direct child), right? Whereas now we have a single parent for each grandchild process. But here, especially if OnError had been called as reaction to an OOM condition by a gigantic JVM, reusing that in-between shell may be preferable. Forking off a large process can be expensive. > > (Obviously, its all undocumented, which is bad in itself). > > All of these are questions - I may not know the full story. Hi Thomas! Not sure I understand the first line about behaviour change. The ; separator was causing new separate shells used sequentially, but distinct OnError= arguments were not (yes, IF somebody has discovered that this works). So with the change, everything gets a new shell fork/exec'd. This should be more consistent, less surprising. I can't pretend the previous behaviour was to save memory! The posix_spawn usage hopefully means such big processes are more efficient. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24354#issuecomment-2786223663 From luhenry at openjdk.org Tue Apr 8 12:14:18 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 8 Apr 2025 12:14:18 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L In-Reply-To: References: Message-ID: <_8ulH1PIsfxxO_hBMK4kbMGF3eY1GQgsdUfk5bVVqCo=.d62134e2-b94f-4b9c-9b5f-f903441b7890@github.com> On Mon, 7 Apr 2025 14:23:52 GMT, Hamlin Li wrote: > the reason is the generated code by Zicond is much larger than branch version I'm curious about this one. It's surprising to me that we see bigger code generated with Zicond. Do you know why that is the case? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24490#issuecomment-2786233412 From mbaesken at openjdk.org Tue Apr 8 12:31:44 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 8 Apr 2025 12:31:44 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v5] In-Reply-To: References: Message-ID: <8Huv-fCxPaBXJXYdLuFoJQHrKC89ABR1KJ1Dm9rz7dw=.23605fb4-2e54-4345-89c2-e9262400d752@github.com> > The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. > SOURCE=".:git:21af8c7e7405" > Also the MODULES list is probably useful to have. > Add this info (or the complete content of the release file) to the hs_err files. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: use tmp variable and release_store ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24244/files - new: https://git.openjdk.org/jdk/pull/24244/files/3c1bc269..7af59d69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24244&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24244&range=03-04 Stats: 31 lines in 1 file changed: 3 ins; 3 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/24244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24244/head:pull/24244 PR: https://git.openjdk.org/jdk/pull/24244 From mli at openjdk.org Tue Apr 8 12:34:22 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 8 Apr 2025 12:34:22 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 14:23:52 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > On riscv, CMoveI/L already were implemented, but there are some gap: > 1. CMoveI/L does not support comparison with float/double, corresponding tests are not turn on either. > 2. Some optimization of C2 is not turned on, e.g. `Phi -> CMove -> min_max`. > 3. lack of some corresponding performance tests. > > Also there are some issue with current Zicond: > 1. UseZicond is turned on automatically by hwprobe, but jmh tests show that it's not always bring benefit, in some situation it even brings regression, the reason is the generated code by Zicond is much larger than branch version, in particular when it's in a loop and unrolled. > > This patch on riscv is to: > 1. add CMoveI/L comparing float/double, and corresponding tests, > 2. enable more C2 optimization, > 3. add more benchmark tests, > 4. turn off UseZicond by default. > > Thanks! > > ## Performance > > ### MinMax > test data > > Benchmark | Mode | Cnt | Score - master | Score - master+UseZbb | Score - -master+UseZicond | Score - master+UseZicond+UseZbb | Score - cmovei | Score - cmovei+UseZbb | Score - cmovei+UseZicond | Score - cmovei+UseZicond+UseZbb | Error | Units | Opt (master/cmovei) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.vm.compiler.IfMinMax.testReductionInt | avgt | 40 | 17152.075 | 17216.592 | 17272.493 | 17296.89 | 17127.844 | 17036.605 | 17299.333 | 17250.566 | 73.179 | ns/op | 1.001 > o.o.b.vm.compiler.IfMinMax.testReductionLong | avgt | 40 | 19770.828 | 19967.578 | 20268.905 | 20166.165 | 20065.552 | 20059.095 | 20161.914 | 20151.295 | 131.428 | ns/op | 0.985 > o.o.b.vm.compiler.IfMinMax.testSingleInt | avgt | 40 | 114.734 | 114.402 | 114.887 | 114.384 | 114.4 | 110.631 | 112.162 | 110.915 | 0.333 | ns/op | 1.003 > o.o.b.vm.compiler.IfMinMax.testSingleLong | avgt | 40 | 121.53 | 121.711 | 120.91 | 121.665 | 121.309 | 120.57 | 118.639 | 119.373 | 0.451 | ns/op | 1.002 > o.o.b.vm.compiler.IfMinMax.testVectorInt | avgt | 40 | 60130.165 | 60062.303 | 61839.776 | 61895.194 | 15887.398 | 15924.502 | 15874.835 | 15667.936 | 101.94 | ns/op | 3.785 > o.o.b.vm.compiler.IfMinMax.testVectorLong | avgt | 40 | 63855.379 | 6309... because zicond code is bigger, e.g. void MacroAssembler::cmov_eq(Register cmp1, Register cmp2, Register dst, Register src) { if (UseZicond) { xorr(t0, cmp1, cmp2); czero_eqz(dst, dst, t0); czero_nez(t1 , src, t0); orr(dst, dst, t1); return; } Label no_set; bne(cmp1, cmp2, no_set); mv(dst, src); bind(no_set); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/24490#issuecomment-2786284949 From mbaesken at openjdk.org Tue Apr 8 12:36:18 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 8 Apr 2025 12:36:18 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v5] In-Reply-To: <8Huv-fCxPaBXJXYdLuFoJQHrKC89ABR1KJ1Dm9rz7dw=.23605fb4-2e54-4345-89c2-e9262400d752@github.com> References: <8Huv-fCxPaBXJXYdLuFoJQHrKC89ABR1KJ1Dm9rz7dw=.23605fb4-2e54-4345-89c2-e9262400d752@github.com> Message-ID: On Tue, 8 Apr 2025 12:31:44 GMT, Matthias Baesken wrote: >> The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. >> SOURCE=".:git:21af8c7e7405" >> Also the MODULES list is probably useful to have. >> Add this info (or the complete content of the release file) to the hs_err files. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use tmp variable and release_store I adjusted the coding in os::read_image_release_file , use now an assert at the beginning of the method and also a tmp variable and as suggested by David a release_store . ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2786290221 From mdoerr at openjdk.org Tue Apr 8 13:02:24 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 8 Apr 2025 13:02:24 GMT Subject: RFR: 8351666: [PPC64] Make non-volatile VectorRegisters available for C2 register allocation [v2] In-Reply-To: References: Message-ID: > This PR makes the non-volatile VectorRegisters available for C2's register allocation. > > I had to implement the VectorRegisters properly (4 VM Regs) like on other platforms. The old version has run into assertions and looked strange. > > The non-volatile VectorRegisters are now saved when entering Java: call_stub and upcall_stubs. > I have rewritten the save and restore functions and used them for both. Then, I have removed code which has become dead. I only save and restore them if C2 uses the vector instructions (controlled by `SuperwordUseVSX`). > I have moved the non-volatile spill area out of the entry_frame_locals because it has a variable size, now. > > The stack area for all non-volatile registers has become larger than the 288 Bytes which are allowed to be used below the SP (specified by the ABI). Therefore, I had to rewrite the call_stub sequence significantly. We need to push the new frame before saving the registers, now. > > Saving and restoring the FP registers is not needed in the slow signature handler which also uses the save and restore code for non-volatile registers. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge remote-tracking branch 'origin' into 8351666_PPC64_nv_VRs - C2: Specify VSR52-63 as SOE and revert commit 2. - Fix register classification. - Update Copyright headers. - Add missing alignment in upcall stub frames. - Avoid redundant nv VR spill code in CRC stubs. - 8351666: [PPC64] Make non-volatile VectorRegisters available for C2 register allocation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23987/files - new: https://git.openjdk.org/jdk/pull/23987/files/c19272c9..89a1e9d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23987&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23987&range=00-01 Stats: 124047 lines in 2971 files changed: 52627 ins; 59524 del; 11896 mod Patch: https://git.openjdk.org/jdk/pull/23987.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23987/head:pull/23987 PR: https://git.openjdk.org/jdk/pull/23987 From mbaesken at openjdk.org Tue Apr 8 13:09:11 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 8 Apr 2025 13:09:11 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v5] In-Reply-To: <8Huv-fCxPaBXJXYdLuFoJQHrKC89ABR1KJ1Dm9rz7dw=.23605fb4-2e54-4345-89c2-e9262400d752@github.com> References: <8Huv-fCxPaBXJXYdLuFoJQHrKC89ABR1KJ1Dm9rz7dw=.23605fb4-2e54-4345-89c2-e9262400d752@github.com> Message-ID: On Tue, 8 Apr 2025 12:31:44 GMT, Matthias Baesken wrote: >> The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. >> SOURCE=".:git:21af8c7e7405" >> Also the MODULES list is probably useful to have. >> Add this info (or the complete content of the release file) to the hs_err files. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use tmp variable and release_store Should I use stat/fstat for getting the release file size? I think it was suggested earlier (but the current coding works too) . ------------- PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2786378181 From mbaesken at openjdk.org Tue Apr 8 13:09:10 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 8 Apr 2025 13:09:10 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v6] In-Reply-To: References: Message-ID: <7rvz3cdG5emi43h_mqEesMgzjwl0xQGBxZDOUuoOldI=.2cdf363d-db7b-4e94-955b-9c45bbcf9845@github.com> > The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. > SOURCE=".:git:21af8c7e7405" > Also the MODULES list is probably useful to have. > Add this info (or the complete content of the release file) to the hs_err files. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: print_image_release_file use load_acquire ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24244/files - new: https://git.openjdk.org/jdk/pull/24244/files/7af59d69..ec6b7f19 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24244&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24244&range=04-05 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24244/head:pull/24244 PR: https://git.openjdk.org/jdk/pull/24244 From mbaesken at openjdk.org Tue Apr 8 13:09:13 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 8 Apr 2025 13:09:13 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v4] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 07:59:15 GMT, David Holmes wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> use stringStream in os.cpp, add a couple of changes suggested by David > > src/hotspot/share/runtime/os.cpp line 1573: > >> 1571: >> 1572: void os::print_image_release_file(outputStream* st) { >> 1573: if (_image_release_file_content != nullptr) { > > Use load_acquire. I adjusted the coding. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2033151857 From stefank at openjdk.org Tue Apr 8 13:12:23 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 8 Apr 2025 13:12:23 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v9] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 13:30:44 GMT, Gerard Ziemski wrote: >> This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. >> >> I tried to fill in tag, when I was pretty certain that I had the right type. >> >> At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > small last feedback from Stefan Looks good. Thanks for incorporating my suggestions! ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24282#pullrequestreview-2749927765 From roland at openjdk.org Tue Apr 8 13:14:25 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 8 Apr 2025 13:14:25 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v6] In-Reply-To: References: Message-ID: On Sat, 5 Apr 2025 14:29:28 GMT, Zihao Lin wrote: >> This patch remove slice parameter from LoadNode::make >> >> Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 >> >> Hi team, I am new, I'd appreciate any guidance. Thank a lot! > > Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into 8344116 > - Fix build > - Fix test failed > - 8344116: C2: remove slice parameter from LoadNode::make src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 223: > 221: MergeMemNode* mm = opt_access.mem(); > 222: PhaseGVN& gvn = opt_access.gvn(); > 223: Node* mem = mm->memory_at(gvn.C->get_alias_index(access.addr().type())); Can we get rid of all uses of `access.addr().type()`? src/hotspot/share/gc/shared/c2/cardTableBarrierSetC2.cpp line 105: > 103: // stores. In theory we could relax the load from ctrl() to > 104: // no_ctrl, but that doesn't buy much latitude. > 105: Node* card_val = __ load( __ ctrl(), card_adr, TypeInt::BYTE, T_BYTE); We could asssert that `C->get_alias_index(kit->type(card_adr) == Compile::AliasIdxRaw`, that is that computed slice is the same as hardcoded slide. Similar asserts could be added for every location where a slice/address type is removed in this patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2033149694 PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2033162534 From stuefe at openjdk.org Tue Apr 8 13:27:13 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 8 Apr 2025 13:27:13 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v3] In-Reply-To: References: Message-ID: <4etdppK-GRNJOrJuDAvlhrpNhklx394qL4zxhxIGiaQ=.1a4ed35a-8dff-40e8-ad1a-ebeff7d319d8@github.com> On Tue, 8 Apr 2025 12:08:38 GMT, Kevin Walls wrote: > > Hmm. > > May there not be customers that specify a verbatim "spelled out" shell script as input for OnError? This is a behavior change. > > And if not (if the commands issued with OnError, separated by ;, are in turn real commands, programs or scripts): Those, before, were forked off as separate grandchilds to the same parent (the direct child), right? Whereas now we have a single parent for each grandchild process. But here, especially if OnError had been called as reaction to an OOM condition by a gigantic JVM, reusing that in-between shell may be preferable. Forking off a large process can be expensive. > > (Obviously, its all undocumented, which is bad in itself). > > All of these are questions - I may not know the full story. > > Hi Thomas! Not sure I understand the first line about behaviour change. The ; separator was causing new separate shells used sequentially, but distinct OnError= arguments were not (yes, IF somebody has discovered that this works). So with the change, everything gets a new shell fork/exec'd. This should be more consistent, less surprising. > I can't pretend the previous behaviour was to save memory! The posix_spawn usage hopefully means such big processes are more efficient. Yes, posix_spawn helps, but for one I am not sure how solid the implementation is on non-linux unices, e.g. AIX. I would not be surprised if it still copied the whole working set upfront. But even if not, there is still some overhead for spawning with posix_spawn, even with COW. E.g. you need to duplicate the page table set. When we write an error log, the working set of the JVM may be ridiculously large. You don't want to spend much time here, since - at this point - we have no cancellation logic (we are outside of VMError::report()), and you want to stop the JVM as fast as possible. Only then would outside processes notice the JVM is a goner, and e.g start a fresh JVM to serve user requests again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24354#issuecomment-2786432083 From kevinw at openjdk.org Tue Apr 8 13:36:18 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Tue, 8 Apr 2025 13:36:18 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v3] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 09:42:16 GMT, Thomas Stuefe wrote: >> Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: >> >> comment > > src/hotspot/share/utilities/vmError.cpp line 156: > >> 154: >> 155: const char * cmdend = cmd; >> 156: while (*cmdend != '\0' && *cmdend != ';' && *cmdend != '\n') cmdend++; > > What happens if I specify, eg in Bash: > > -XX:OnError=lengthy command \ > more command options > > ? They should get joined into one line by the shell, so we see them as one OnError arg: $ java ....args... -XX:OnError="/bin/echo ONE \ > TWO THREE" MyApp ...crash: # -XX:OnError="/bin/echo ONE TWO THREE" # Executing /bin/sh -c "/bin/echo ONE TWO THREE" ... ONE TWO THREE Aborted (core dumped) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24354#discussion_r2033206360 From stuefe at openjdk.org Tue Apr 8 13:45:22 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 8 Apr 2025 13:45:22 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v3] In-Reply-To: References: Message-ID: <81rIMmhWcnDfEsTsIFIc8ZqqJ2xswFJTBWkKXjxTNoQ=.727340f1-4a1e-4c1b-97ce-1423de76369c@github.com> On Mon, 7 Apr 2025 10:44:02 GMT, Kevin Walls wrote: >> We should be consistent, and run all OnError items in a new shell. Currently the ; separator causes a new shell, but multiple -XX:OnError= options are grouped into the same shell. >> >> next_OnError_command() decides on where a new command starts. It should recognise newlines, and all commands will get their own shell. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > comment Looks good to me. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24354#pullrequestreview-2750035837 From stuefe at openjdk.org Tue Apr 8 13:45:22 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 8 Apr 2025 13:45:22 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v3] In-Reply-To: <4etdppK-GRNJOrJuDAvlhrpNhklx394qL4zxhxIGiaQ=.1a4ed35a-8dff-40e8-ad1a-ebeff7d319d8@github.com> References: <4etdppK-GRNJOrJuDAvlhrpNhklx394qL4zxhxIGiaQ=.1a4ed35a-8dff-40e8-ad1a-ebeff7d319d8@github.com> Message-ID: On Tue, 8 Apr 2025 13:24:32 GMT, Thomas Stuefe wrote: > > > Hmm. > > > May there not be customers that specify a verbatim "spelled out" shell script as input for OnError? This is a behavior change. > > > And if not (if the commands issued with OnError, separated by ;, are in turn real commands, programs or scripts): Those, before, were forked off as separate grandchilds to the same parent (the direct child), right? Whereas now we have a single parent for each grandchild process. But here, especially if OnError had been called as reaction to an OOM condition by a gigantic JVM, reusing that in-between shell may be preferable. Forking off a large process can be expensive. > > > (Obviously, its all undocumented, which is bad in itself). > > > All of these are questions - I may not know the full story. > > > > > > Hi Thomas! Not sure I understand the first line about behaviour change. The ; separator was causing new separate shells used sequentially, but distinct OnError= arguments were not (yes, IF somebody has discovered that this works). So with the change, everything gets a new shell fork/exec'd. This should be more consistent, less surprising. > > > I can't pretend the previous behaviour was to save memory! The posix_spawn usage hopefully means such big processes are more efficient. > > Yes, posix_spawn helps, but for one I am not sure how solid the implementation is on non-linux unices, e.g. AIX. I would not be surprised if it still copied the whole working set upfront. But even if not, there is still some overhead for spawning with posix_spawn, even with COW. E.g. you need to duplicate the page table set. > > When we write an error log, the working set of the JVM may be ridiculously large. You don't want to spend much time here, since - at this point - we have no cancellation logic (we are outside of VMError::report()), and you want to stop the JVM as fast as possible. Only then would outside processes notice the JVM is a goner, and e.g start a fresh JVM to serve user requests again. But I don't want to keep up this PR, if you think this is the way forward. There is a very simple workaround for the problem I described above, which is to group multiple scripts into a single umbrella script and call that with a single OnError argument. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24354#issuecomment-2786483773 From kevinw at openjdk.org Tue Apr 8 13:54:17 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Tue, 8 Apr 2025 13:54:17 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v3] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 10:44:02 GMT, Kevin Walls wrote: >> We should be consistent, and run all OnError items in a new shell. Currently the ; separator causes a new shell, but multiple -XX:OnError= options are grouped into the same shell. >> >> next_OnError_command() decides on where a new command starts. It should recognise newlines, and all commands will get their own shell. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > comment Thanks Thomas! Yes I would think if people are really very sensitive to the restart time, then they just aren't using OnError. If they do, then I hope the fork time is dwarfed by the time spent writing the core (or they use OnError instead of writing a core). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24354#issuecomment-2786512653 From stefank at openjdk.org Tue Apr 8 14:22:27 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 8 Apr 2025 14:22:27 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 15:27:15 GMT, Robert Toyonaga wrote: >> ### Update: >> After some discussion it was decided it's not necessary to expand the lock scope for reserve/commit. Instead, we are opting to add comments explaining the reasons for locking and the conditions to avoid which could lead to races. Some of the new tests can be kept because they are general enough to be useful outside of this context. >> >> ### Summary: >> This PR makes memory operations atomic with NMT accounting. >> >> ### The problem: >> In memory related functions like `os::commit_memory` and `os::reserve_memory` the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in `MemTracker`, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races. >> >> 1.1 Thread_1 releases range_A. >> 1.2 Thread_1 tells NMT "range_A has been released". >> >> 2.1 Thread_2 reserves (the now free) range_A. >> 2.2 Thread_2 tells NMT "range_A is reserved". >> >> Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS. >> >> ### Solution: >> Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself. >> >> ### Other notes: >> I also simplified this pattern found in many places: >> >> if (MemTracker::enabled()) { >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_some_operation(addr, bytes); >> if (result != nullptr) { >> MemTracker::record_some_operation(addr, bytes); >> } >> } else { >> result = pd_unmap_memory(addr, bytes); >> } >> ``` >> To: >> >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_unmap_memory(addr, bytes); >> MemTracker::record_some_operation(addr, bytes); >> ``` >> This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. >> >> I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker:... > > Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision: > > exclude file mapping tests on AIX. I think this looks good to me, but please seek feedback from others as well. I've added a couple of suggestions. None of them are required, but I think they would be nice to do. src/hotspot/share/runtime/os.cpp line 2206: > 2204: // when it is actually committed. The opposite scenario is not guarded against. pd_commit_memory and > 2205: // record_virtual_memory_commit do not happen atomically. We assume that there is some external synchronization > 2206: // that prevents a region from being uncommitted before it is finished being committed. It's not a requirement, but you get kudos from me if you keep comments lines below 80 lines. I typically don't like code to be 80 lines, but comments tend to be nicer if they are. test/hotspot/gtest/runtime/test_os.cpp line 1123: > 1121: > 1122: char* base = os::reserve_memory(size, false, mtTest); > 1123: ASSERT_NE(base, (char*) nullptr); Suggestion: ASSERT_NOT_NULL(base); And the same in other places. test/hotspot/gtest/runtime/test_os.cpp line 1133: > 1131: } > 1132: > 1133: #if !defined(_AIX) Suggestion: #if !defined(_AIX) I suggest a blank line here because this ifdef spans multiple tests and not only the nearest test. Having a blank line makes it clearer that this is a large ifdef that is not only related to the test case that it is bunched up against. test/hotspot/gtest/runtime/test_os.cpp line 1145: > 1143: EXPECT_TRUE(result != nullptr); > 1144: > 1145: EXPECT_TRUE(strcmp(letters, result)==0); Suggestion: EXPECT_TRUE(strcmp(letters, result) == 0); but probably even better: Suggestion: EXPECT_EQ(strcmp(letters, result), 0); test/hotspot/gtest/runtime/test_os.cpp line 1184: > 1182: ::close(fd); > 1183: } > 1184: #endif Suggestion: #endif // !defined(_AIX) I suggest a blank line and a matching comment. I know some HotSpots devs tend to appreciate those comments. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24084#pullrequestreview-2750137709 PR Review Comment: https://git.openjdk.org/jdk/pull/24084#discussion_r2033287481 PR Review Comment: https://git.openjdk.org/jdk/pull/24084#discussion_r2033292443 PR Review Comment: https://git.openjdk.org/jdk/pull/24084#discussion_r2033303666 PR Review Comment: https://git.openjdk.org/jdk/pull/24084#discussion_r2033294266 PR Review Comment: https://git.openjdk.org/jdk/pull/24084#discussion_r2033307030 From stefank at openjdk.org Tue Apr 8 14:22:27 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 8 Apr 2025 14:22:27 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock [v3] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 14:11:21 GMT, Stefan Karlsson wrote: >> Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision: >> >> exclude file mapping tests on AIX. > > test/hotspot/gtest/runtime/test_os.cpp line 1145: > >> 1143: EXPECT_TRUE(result != nullptr); >> 1144: >> 1145: EXPECT_TRUE(strcmp(letters, result)==0); > > Suggestion: > > EXPECT_TRUE(strcmp(letters, result) == 0); > > but probably even better: > Suggestion: > > EXPECT_EQ(strcmp(letters, result), 0); There are more places like this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24084#discussion_r2033296961 From fjiang at openjdk.org Tue Apr 8 15:08:21 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 8 Apr 2025 15:08:21 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 14:23:52 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > On riscv, CMoveI/L already were implemented, but there are some gap: > 1. CMoveI/L does not support comparison with float/double, corresponding tests are not turn on either. > 2. Some optimization of C2 is not turned on, e.g. `Phi -> CMove -> min_max`. > 3. lack of some corresponding performance tests. > > Also there are some issue with current Zicond: > 1. UseZicond is turned on automatically by hwprobe, but jmh tests show that it's not always bring benefit, in some situation it even brings regression, the reason is the generated code by Zicond is much larger than branch version, in particular when it's in a loop and unrolled. > > This patch on riscv is to: > 1. add CMoveI/L comparing float/double, and corresponding tests, > 2. enable more C2 optimization, > 3. add more benchmark tests, > 4. turn off UseZicond by default. > > Thanks! > > ## Performance > > ### MinMax > test data > > Benchmark | Mode | Cnt | Score - master | Score - master+UseZbb | Score - -master+UseZicond | Score - master+UseZicond+UseZbb | Score - cmovei | Score - cmovei+UseZbb | Score - cmovei+UseZicond | Score - cmovei+UseZicond+UseZbb | Error | Units | Opt (master/cmovei) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.vm.compiler.IfMinMax.testReductionInt | avgt | 40 | 17152.075 | 17216.592 | 17272.493 | 17296.89 | 17127.844 | 17036.605 | 17299.333 | 17250.566 | 73.179 | ns/op | 1.001 > o.o.b.vm.compiler.IfMinMax.testReductionLong | avgt | 40 | 19770.828 | 19967.578 | 20268.905 | 20166.165 | 20065.552 | 20059.095 | 20161.914 | 20151.295 | 131.428 | ns/op | 0.985 > o.o.b.vm.compiler.IfMinMax.testSingleInt | avgt | 40 | 114.734 | 114.402 | 114.887 | 114.384 | 114.4 | 110.631 | 112.162 | 110.915 | 0.333 | ns/op | 1.003 > o.o.b.vm.compiler.IfMinMax.testSingleLong | avgt | 40 | 121.53 | 121.711 | 120.91 | 121.665 | 121.309 | 120.57 | 118.639 | 119.373 | 0.451 | ns/op | 1.002 > o.o.b.vm.compiler.IfMinMax.testVectorInt | avgt | 40 | 60130.165 | 60062.303 | 61839.776 | 61895.194 | 15887.398 | 15924.502 | 15874.835 | 15667.936 | 101.94 | ns/op | 3.785 > o.o.b.vm.compiler.IfMinMax.testVectorLong | avgt | 40 | 63855.379 | 6309... src/hotspot/cpu/riscv/riscv.ad line 9979: > 9977: > 9978: format %{ > 9979: "CMove $dst, ($op1 $cop $op2), $dst, $src\t#@cmovI_cmpF\n\t" Should be `CMoveI` too? src/hotspot/cpu/riscv/riscv.ad line 9996: > 9994: > 9995: format %{ > 9996: "CMove $dst, ($op1 $cop $op2), $dst, $src\t#@cmovI_cmpD\n\t" Ditto. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24490#discussion_r2033403739 PR Review Comment: https://git.openjdk.org/jdk/pull/24490#discussion_r2033404091 From stuefe at openjdk.org Tue Apr 8 15:22:20 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 8 Apr 2025 15:22:20 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v3] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 13:51:08 GMT, Kevin Walls wrote: > Thanks Thomas! Yes I would think if people are really very sensitive to the restart time, then they just aren't using OnError. If they do, then I hope the fork time is dwarfed by the time spent writing the core (or they use OnError instead of writing a core). Typically these installations disable core writing and set ErrorLogTimeout to something really low. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24354#issuecomment-2786810644 From sroy at openjdk.org Tue Apr 8 15:26:22 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Tue, 8 Apr 2025 15:26:22 GMT Subject: RFR: JDK-8331859 : [PPC64] Remove support for Power7 and older In-Reply-To: References: <-IDumV-vCJZ1UbAB91vfGAgZZ0nRJPuCP9EjZxOhExc=.07fd4fd9-03e5-4408-87a3-a6eabf1be724@github.com> Message-ID: <8a1c_K-kmDMyTb8_1raO_44IVOj1mz2WcBk98lgLb38=.233db15d-41ae-4b50-9cae-0c720664451c@github.com> On Mon, 7 Apr 2025 08:41:36 GMT, Martin Doerr wrote: >> Hi @TheRealMDoerr I removed the instructions mentioned in the issue. How can I determine which instructions were older ? Is there a file where it is mentioned specifically ? > > The Power ISA has a section "Appendix E. Power ISA Instruction Set Sorted by Opcode" at the end. We require Power8 which matches "Power ISA v2.07". All instructions from v2.07 or older are available. Hi @TheRealMDoerr some instructions have "PPC" written in version..what does this mean ? Also I do not see mfdscr in the appendix ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20262#discussion_r2033449189 From sroy at openjdk.org Tue Apr 8 16:07:16 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Tue, 8 Apr 2025 16:07:16 GMT Subject: RFR: JDK-8331859 : [PPC64] Remove support for Power7 and older In-Reply-To: <8a1c_K-kmDMyTb8_1raO_44IVOj1mz2WcBk98lgLb38=.233db15d-41ae-4b50-9cae-0c720664451c@github.com> References: <-IDumV-vCJZ1UbAB91vfGAgZZ0nRJPuCP9EjZxOhExc=.07fd4fd9-03e5-4408-87a3-a6eabf1be724@github.com> <8a1c_K-kmDMyTb8_1raO_44IVOj1mz2WcBk98lgLb38=.233db15d-41ae-4b50-9cae-0c720664451c@github.com> Message-ID: On Tue, 8 Apr 2025 15:23:20 GMT, Suchismith Roy wrote: >> The Power ISA has a section "Appendix E. Power ISA Instruction Set Sorted by Opcode" at the end. We require Power8 which matches "Power ISA v2.07". All instructions from v2.07 or older are available. > > Hi @TheRealMDoerr some instructions have "PPC" written in version..what does this mean ? > Also I do not see mfdscr in the appendix Also should has_vsx be removed too ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20262#discussion_r2033539066 From mdoerr at openjdk.org Tue Apr 8 16:08:32 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 8 Apr 2025 16:08:32 GMT Subject: RFR: 8351666: [PPC64] Make non-volatile VectorRegisters available for C2 register allocation [v4] In-Reply-To: References: Message-ID: > This PR makes the non-volatile VectorRegisters available for C2's register allocation. > > I had to implement the VectorRegisters properly (4 VM Regs) like on other platforms. The old version has run into assertions and looked strange. > > The non-volatile VectorRegisters are now saved when entering Java: call_stub and upcall_stubs. > I have rewritten the save and restore functions and used them for both. Then, I have removed code which has become dead. I only save and restore them if C2 uses the vector instructions (controlled by `SuperwordUseVSX`). > I have moved the non-volatile spill area out of the entry_frame_locals because it has a variable size, now. > > The stack area for all non-volatile registers has become larger than the 288 Bytes which are allowed to be used below the SP (specified by the ABI). Therefore, I had to rewrite the call_stub sequence significantly. We need to push the new frame before saving the registers, now. > > Saving and restoring the FP registers is not needed in the slow signature handler which also uses the save and restore code for non-volatile registers. > > On Power10, we use vector pair instructions since Commit 8. E.g. in the call stub: > > 0x000072c9483c07b4: stxvp vs52,-224(r2) > 0x000072c9483c07b8: stxvp vs54,-192(r2) > 0x000072c9483c07bc: stxvp vs56,-160(r2) > 0x000072c9483c07c0: stxvp vs58,-128(r2) > 0x000072c9483c07c4: stxvp vs60,-96(r2) > 0x000072c9483c07c8: stxvp vs62,-64(r2) > > > > 0x000072c9483c0914: lxvp vs52,-224(r2) > 0x000072c9483c0918: lxvp vs54,-192(r2) > 0x000072c9483c091c: lxvp vs56,-160(r2) > 0x000072c9483c0920: lxvp vs58,-128(r2) > 0x000072c9483c0924: lxvp vs60,-96(r2) > 0x000072c9483c0928: lxvp vs62,-64(r2) Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Update Copyright header. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23987/files - new: https://git.openjdk.org/jdk/pull/23987/files/05594c05..e69d7183 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23987&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23987&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23987.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23987/head:pull/23987 PR: https://git.openjdk.org/jdk/pull/23987 From mdoerr at openjdk.org Tue Apr 8 16:19:23 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 8 Apr 2025 16:19:23 GMT Subject: RFR: JDK-8331859 : [PPC64] Remove support for Power7 and older In-Reply-To: References: <-IDumV-vCJZ1UbAB91vfGAgZZ0nRJPuCP9EjZxOhExc=.07fd4fd9-03e5-4408-87a3-a6eabf1be724@github.com> <8a1c_K-kmDMyTb8_1raO_44IVOj1mz2WcBk98lgLb38=.233db15d-41ae-4b50-9cae-0c720664451c@github.com> Message-ID: On Tue, 8 Apr 2025 16:04:21 GMT, Suchismith Roy wrote: >> Hi @TheRealMDoerr some instructions have "PPC" written in version..what does this mean ? >> Also I do not see mfdscr in the appendix > > Also should has_vsx be removed too ? "PPC" is older than V2.07. "mfsdcr" is a special version of "mfspr". "has_vsx" is also given on Power8. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20262#discussion_r2033563655 From lmesnik at openjdk.org Tue Apr 8 16:31:20 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 8 Apr 2025 16:31:20 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances [v3] In-Reply-To: <8oHlvz98KPGFmzSttjpmKbjNSQkbF0rmLlE9Aqdgs1M=.4b96d172-54c6-4005-bcd2-1a5ef76198a0@github.com> References: <8oHlvz98KPGFmzSttjpmKbjNSQkbF0rmLlE9Aqdgs1M=.4b96d172-54c6-4005-bcd2-1a5ef76198a0@github.com> Message-ID: <6ZjBGGo_NRkC6ix1HW-wiCI9wyg1fNkEi8oprkp7U1M=.f527d3a8-6bfc-4dd8-8533-3aa5e4ef86d9@github.com> On Tue, 8 Apr 2025 05:46:09 GMT, Thomas Stuefe wrote: >> In preparation for planned GC performance improvements (KLUT), I would like to reduce the average number of oop map entries. >> >> For details, please see JBS issue text. >> >> ----------------------- >> >> Patch results: >> >> The patch brings a positive change of oop map size, reducing the likelihood of lengthy oop maps. Here the oop map size distribution over all JDK classes in the JDK image: >> >> Before: >> >> 5395 - non-static oop maps (0 entries) >> 9330 - non-static oop maps (1 entries) >> 1449 - non-static oop maps (2 entries) >> 274 - non-static oop maps (3 entries) >> 218 - non-static oop maps (4 entries) >> 75 - non-static oop maps (5 entries) >> 7 - non-static oop maps (6 entries) >> 4 - non-static oop maps (7 entries) >> >> >> Now: >> >> 5395 - non-static oop maps (0 entries) >> 10178 - non-static oop maps (1 entries) >> 933 - non-static oop maps (2 entries) >> 229 - non-static oop maps (3 entries) >> 16 - non-static oop maps (4 entries) >> 1 - non-static oop maps (5 entries) >> >> >> For example, `java.util.concurrent.ConcurrentHashMap$TreeNode` is changed from having 2 entries to having just one entry, which is nice for a class that may be instantiated a lot: >> >> Before: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000000d1dddc0} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'red' 'Z' @28 << derived class starts here, non-oops lead >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 >> - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 >> - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 >> - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @44 >> - non-static oop maps (2 entries): 16-24 32-44 >> >> Now: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000007e1de450} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @28 << class starts here, oop... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - test fixes > - Merge branch 'master' into JDK-8353273-Reduce-average-number-of-oop-map-entries-in-instance-objects > - add regression test > - Reworked to use prior super klass layout reconstruction pass > - Merge branch 'master' into JDK-8353273-Reduce-average-number-of-oop-map-entries-in-instance-objects > - alternate-order > - print Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24330#pullrequestreview-2750646187 From shade at openjdk.org Tue Apr 8 17:59:25 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 8 Apr 2025 17:59:25 GMT Subject: RFR: 8354062: x86: Optimize stores of zero immediates with r12_heapbase In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 17:53:29 GMT, Aleksey Shipilev wrote: > X86 does not have zero register. Except that it does for Hotspot, when compressed oops are enabled and heap base is zero. C2 routinely uses `r12` as zero register then. It makes the code considerably more compact. We can do the same in `MacroAssembler`. This would target the stores of known zeroes, which are surprisingly frequent in C1, mostly for zeroing out various `JavaThread` slots, e.g. for exception handling. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `all` Sample code density improvements: $ for I in 1 2 3 4; do build/linux-x86_64-server-release/images/jdk/bin/java -XX:TieredStopAtLevel=${I} \ -Xcomp -XX:+CITime Hello 2>&1 | grep "Tier${I}" | cut -d' ' -f 3,23-; done # Before Tier1 nmethods_size: 668432 bytes; nmethods_code_size: 431960 bytes} Tier2 nmethods_size: 718144 bytes; nmethods_code_size: 467888 bytes} Tier3 nmethods_size: 1328424 bytes; nmethods_code_size: 1009728 bytes} Tier4 nmethods_size: 493704 bytes; nmethods_code_size: 337272 bytes} # After Tier1 nmethods_size: 664880 bytes; nmethods_code_size: 428408 bytes} ; -0.8% Tier2 nmethods_size: 714576 bytes; nmethods_code_size: 464320 bytes} ; -0.8% Tier3 nmethods_size: 1324888 bytes; nmethods_code_size: 1006192 bytes} ; -0.4% Tier4 nmethods_size: 493368 bytes; nmethods_code_size: 336936 bytes} ; -0.1% ------------- PR Comment: https://git.openjdk.org/jdk/pull/24519#issuecomment-2787243486 From shade at openjdk.org Tue Apr 8 17:59:25 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 8 Apr 2025 17:59:25 GMT Subject: RFR: 8354062: x86: Optimize stores of zero immediates with r12_heapbase Message-ID: X86 does not have zero register. Except that it does for Hotspot, when compressed oops are enabled and heap base is zero. C2 routinely uses `r12` as zero register then. It makes the code considerably more compact. We can do the same in `MacroAssembler`. This would target the stores of known zeroes, which are surprisingly frequent in C1, mostly for zeroing out various `JavaThread` slots, e.g. for exception handling. Additional testing: - [ ] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/24519/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24519&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354062 Stats: 20 lines in 2 files changed: 17 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24519/head:pull/24519 PR: https://git.openjdk.org/jdk/pull/24519 From cslucas at openjdk.org Tue Apr 8 18:10:17 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 8 Apr 2025 18:10:17 GMT Subject: RFR: 8353174: Clean up thread register handling after 32-bit x86 removal In-Reply-To: <1u8EcLMPtsHFmCPJMQ-Hgcfql0MWy88TpSDZjr9AqrQ=.79b04f21-d97a-4af9-b83b-533ebb1acb6c@github.com> References: <1u8EcLMPtsHFmCPJMQ-Hgcfql0MWy88TpSDZjr9AqrQ=.79b04f21-d97a-4af9-b83b-533ebb1acb6c@github.com> Message-ID: On Mon, 31 Mar 2025 10:19:57 GMT, Aleksey Shipilev wrote: > Various `MacroAssembler` methods have this code to support passing the thread register, and getting it if `noreg` is passed: > > > // determine java_thread register > if (!java_thread->is_valid()) { > #ifdef _LP64 > java_thread = r15_thread; > #else > java_thread = rdi; > get_thread(java_thread); > #endif // LP64 > } > > > This never happens after 32-bit x86 removal. x86_64 always uses r15_thread. We can clean those up. > > These are also the only major users of `MacroAssembler::get_thread` that we want to remove/rename to avoid falling into traps like [JDK-8353176](https://bugs.openjdk.org/browse/JDK-8353176). > > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` LGTM src/hotspot/cpu/x86/macroAssembler_x86.hpp line 293: > 291: > 292: void get_vm_result (Register oop_result); > 293: void get_vm_result_2(Register metadata_result); NIT: Can you please try to find a better suffix than "_2"? ------------- Marked as reviewed by cslucas (Author). PR Review: https://git.openjdk.org/jdk/pull/24323#pullrequestreview-2750950886 PR Review Comment: https://git.openjdk.org/jdk/pull/24323#discussion_r2033766442 From shade at openjdk.org Tue Apr 8 18:10:17 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 8 Apr 2025 18:10:17 GMT Subject: RFR: 8353174: Clean up thread register handling after 32-bit x86 removal In-Reply-To: References: <1u8EcLMPtsHFmCPJMQ-Hgcfql0MWy88TpSDZjr9AqrQ=.79b04f21-d97a-4af9-b83b-533ebb1acb6c@github.com> Message-ID: <2OYU9Q43aIcSx7xnZLGVo0Ssu_9IHftSh3g52QAkM2w=.b8d091d3-35f6-476d-8163-b40d5d652c8f@github.com> On Tue, 8 Apr 2025 18:04:11 GMT, Cesar Soares Lucas wrote: >> Various `MacroAssembler` methods have this code to support passing the thread register, and getting it if `noreg` is passed: >> >> >> // determine java_thread register >> if (!java_thread->is_valid()) { >> #ifdef _LP64 >> java_thread = r15_thread; >> #else >> java_thread = rdi; >> get_thread(java_thread); >> #endif // LP64 >> } >> >> >> This never happens after 32-bit x86 removal. x86_64 always uses r15_thread. We can clean those up. >> >> These are also the only major users of `MacroAssembler::get_thread` that we want to remove/rename to avoid falling into traps like [JDK-8353176](https://bugs.openjdk.org/browse/JDK-8353176). >> >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` > > src/hotspot/cpu/x86/macroAssembler_x86.hpp line 293: > >> 291: >> 292: void get_vm_result (Register oop_result); >> 293: void get_vm_result_2(Register metadata_result); > > NIT: Can you please try to find a better suffix than "_2"? It is a cross-platform mess. I think we want to rename it in one shot, in a separate PR? Want to take it? This `_2` is basically for metadata, while "`_1`" is for oops. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24323#discussion_r2033770810 From matsaave at openjdk.org Tue Apr 8 18:29:25 2025 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 8 Apr 2025 18:29:25 GMT Subject: RFR: 8349007: The jtreg test ResolvedMethodTableHash takes excessive time [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 20:45:50 GMT, Coleen Phillimore wrote: >> This is mostly test change. ResolvedMethodTableHash.java is run /manual but still takes a long time and doesn't really verify that the hash code is any good. This add logging, triggers concurrent work like other ConcurrentHashtables, and checks that a small table is not rehashed. >> Tested with tier1 (including test). > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix indent and hardcode 1001 loops. The change looks good! Edit: I had the same question as Leonid and didn't notice that you had already answered it. Thanks! ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24383#pullrequestreview-2751039401 From shade at openjdk.org Tue Apr 8 18:44:21 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 8 Apr 2025 18:44:21 GMT Subject: RFR: 8353572: x86: AMD platforms miss the check for CLWB feature flag In-Reply-To: References: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> Message-ID: <314QDfJV4auKGwpK0rkvupAG_iBr1icgZ6azVvWIJro=.d2f945d9-5a70-473e-b473-eb18e1153f05@github.com> On Thu, 3 Apr 2025 20:00:05 GMT, Vladimir Ivanov wrote: >> Noticed this when doing [JDK-8353558](https://bugs.openjdk.org/browse/JDK-8353558). We only check for CLWB feature flag for Intel platforms. But AMD APM (Rev. 3.36?March 2024) tells me there is a CLWB flag in CPUID Fn0000_0007_EBX_x0 leaf as well. It is in the same place as the flag for Intel. >> >> Additional testing: >> - [x] Ad-hoc tests on Ryzen 5950X > > src/hotspot/cpu/x86/vm_version_x86.cpp line 3100: > >> 3098: if (ext_cpuid1_ecx.bits.sse4a != 0) >> 3099: result |= CPU_SSE4A; >> 3100: if (sef_cpuid7_ebx.bits.clwb != 0) > > I'm curious what's the rule here when it comes to vendor-specific features? > > From what I'm seeing in the sources, both AMD and ZX enumerate only `ext_cpuid1` features while for Intel it's a mix of `sef_cpuid7` and `ext_cpuid1`. > > So, I'm curious whether the code should be moved up and shared for all CPUs. Are you happy with this explanation, @iwanowww? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24385#discussion_r2033832750 From mullan at openjdk.org Tue Apr 8 19:15:24 2025 From: mullan at openjdk.org (Sean Mullan) Date: Tue, 8 Apr 2025 19:15:24 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8] In-Reply-To: References: <5WNrv1s7Bp7hLwSVGqoPw9ycCSHK0Zyka65DpAjnB2s=.31243a29-4fbb-4c21-b671-45470d043335@github.com> <5m9xiUkcb41c47vcLKS3kvsK9Jhh1y7PsNRHcffa8ug=.5785cdda-e50e-410a-a139-5554d70bfdff@github.com> Message-ID: On Fri, 4 Apr 2025 15:13:50 GMT, Volodymyr Paprotski wrote: > > > Done I think: https://bugs.openjdk.org/browse/JDK-8297970 > > > > > > Is this link correct? This issue was fixed in JDK 20. > > Sorry.. copy/paste didnt notice.. https://bugs.openjdk.org/browse/JDK-8353670 (also ends in *70!) Looks good, but can you also say something about the approximate performance gain? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23719#issuecomment-2787432878 From shade at openjdk.org Tue Apr 8 19:21:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 8 Apr 2025 19:21:16 GMT Subject: RFR: 8354062: x86: Optimize stores of zero immediates with r12_heapbase In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 17:53:29 GMT, Aleksey Shipilev wrote: > X86 does not have zero register. Except that it does for Hotspot, when compressed oops are enabled and heap base is zero. C2 routinely uses `r12` as zero register then. It makes the code considerably more compact. We can do the same in `MacroAssembler`. This would target the stores of known zeroes, which are surprisingly frequent in C1, mostly for zeroing out various `JavaThread` slots, e.g. for exception handling. > > (Kept x86_32 code intact, in case we want to backport it later. I don't mind removing x86_32 parts either.) > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `all` Test failures. Back to draft. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24519#issuecomment-2787444064 From jiangli at openjdk.org Tue Apr 8 20:40:12 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 8 Apr 2025 20:40:12 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 [v2] In-Reply-To: References: <8py7v6gQd9nfucRK28h2enHWk21PevhMa74FYx3bRsI=.663fa24c-4218-426b-bda4-2e8d6ecb02aa@github.com> Message-ID: On Tue, 8 Apr 2025 03:50:14 GMT, SendaoYan wrote: >> src/hotspot/share/runtime/abstract_vm_version.cpp line 140: >> >>> 138: >>> 139: >>> 140: const char* Abstract_VM_Version::vm_info_string() { >> >> For future benefit, how about also adding a comment explain why we avoid dynamic memory allocation for the vm_info_string here? > > I have add a comment, but my English is not very well. If you have proper description, please let me known. Looks ok to me, thanks. Please update the contributor properly since the change is from https://github.com/openjdk/jdk/pull/24171/commits/baff6b166d130c9adeecfb9f2b418d86322d4826. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24299#discussion_r2033982904 From phh at openjdk.org Tue Apr 8 20:44:17 2025 From: phh at openjdk.org (Paul Hohensee) Date: Tue, 8 Apr 2025 20:44:17 GMT Subject: RFR: 8353593: MethodData "mileage_*" methods and fields aren't used and can be removed In-Reply-To: References: Message-ID: <1B2lwep9GL4SjFADouMGqw50_z25nSOHzkCAXCteMc8=.3b7e4d18-f68a-496c-b8a7-5ca1919eb8ce@github.com> On Thu, 3 Apr 2025 00:52:30 GMT, Cesar Soares Lucas wrote: > Please review this trivial patch to remove dead code from MethodData class. > Tested on Linux x86_64 with JTREG_TIER1. Thanks for the cleanup. ------------- Marked as reviewed by phh (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24399#pullrequestreview-2751338344 From duke at openjdk.org Tue Apr 8 20:57:17 2025 From: duke at openjdk.org (duke) Date: Tue, 8 Apr 2025 20:57:17 GMT Subject: RFR: 8353593: MethodData "mileage_*" methods and fields aren't used and can be removed In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 00:52:30 GMT, Cesar Soares Lucas wrote: > Please review this trivial patch to remove dead code from MethodData class. > Tested on Linux x86_64 with JTREG_TIER1. @JohnTortugo Your change (at version a92a8d6c937534be4362c0a1e07885b07e26b3d2) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24399#issuecomment-2787640578 From duke at openjdk.org Tue Apr 8 21:27:08 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Tue, 8 Apr 2025 21:27:08 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v14] In-Reply-To: References: Message-ID: <394Wf5RpbwUgE7zBaZBnwa2YAxQFwWDhF1VuaMPHdhE=.98ff29f7-b6a7-49eb-bdd6-8489568b24b7@github.com> > By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Reacting to mor comments from Sandhya. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23860/files - new: https://git.openjdk.org/jdk/pull/23860/files/e4ab10bb..0b0d0969 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23860&range=12-13 Stats: 11 lines in 1 file changed: 0 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23860.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23860/head:pull/23860 PR: https://git.openjdk.org/jdk/pull/23860 From duke at openjdk.org Tue Apr 8 21:29:26 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Tue, 8 Apr 2025 21:29:26 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v13] In-Reply-To: References: Message-ID: On Sat, 5 Apr 2025 00:27:05 GMT, Sandhya Viswanathan wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Reacting to comment by Sandhya. > > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 345: > >> 343: >> 344: store4Xmms(coeffs, 0, xmm0_3, _masm); >> 345: store4Xmms(coeffs, 4 * XMMBYTES, xmm4_7, _masm); > > This seems to be unnecessary store. Thanks for catching that. Changed. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 370: > >> 368: loadPerm(xmm16_19, perms, nttL4PermsIdx, _masm); >> 369: loadPerm(xmm12_15, perms, nttL4PermsIdx + 64, _masm); >> 370: load4Xmms(xmm24_27, zetas, 4 * 512, _masm); // for level 3 > > The comment // for level3 is not relevant here and could be removed. Ooops. Deleted the comment. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 802: > >> 800: __ evpbroadcastd(zero, scratch, Assembler::AVX_512bit); // 0 >> 801: __ addl(scratch, 1); >> 802: __ evpbroadcastd(one, scratch, Assembler::AVX_512bit); // 1 > > A better way to initialize (0, 1, -1) vectors is: > // load 0 into int vector > vpxor(zero, zero, zero, Assembler::AVX_512bit); > // load -1 into int vector > vpternlogd(minusOne, 0xff, minusOne, minusOne, Assembler::AVX_512bit); > // load 1 into int vector > vpsubd(one, zero, minusOne, Assembler::AVX_512bit); > > Where minusOne could be xmm31. > > A broadcast from r register to xmm register is more expensive. Changed. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 982: > >> 980: __ evporq(xmm19, k0, xmm19, xmm23, false, Assembler::AVX_512bit); >> 981: >> 982: __ evpsubd(xmm12, k0, zero, one, false, Assembler::AVX_512bit); // -1 > > The -1 initialization could be done outside the loop. Not really. All registers are used. > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1015: > >> 1013: __ addptr(lowPart, 4 * XMMBYTES); >> 1014: __ cmpl(len, 0); >> 1015: __ jcc(Assembler::notEqual, L_loop); > > It looks to me that subl and cmpl could be merged: > __ addptr(highPart, 4 * XMMBYTES); > __ addptr(lowPart, 4 * XMMBYTES); > __ subl(len, 4 * XMMBYTES); > __ jcc(Assembler::notEqual, L_loop); Changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2034057184 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2034057342 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2034057700 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2034057565 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2034057463 From sviswanathan at openjdk.org Tue Apr 8 22:01:42 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 8 Apr 2025 22:01:42 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v14] In-Reply-To: <394Wf5RpbwUgE7zBaZBnwa2YAxQFwWDhF1VuaMPHdhE=.98ff29f7-b6a7-49eb-bdd6-8489568b24b7@github.com> References: <394Wf5RpbwUgE7zBaZBnwa2YAxQFwWDhF1VuaMPHdhE=.98ff29f7-b6a7-49eb-bdd6-8489568b24b7@github.com> Message-ID: <-W1vBCTLtPyOZNm6XhHQXT9spBbkAd4Z4rTn_LHH1Aw=.5beae719-ac8b-404a-a34c-deecfc97dd7e@github.com> On Tue, 8 Apr 2025 21:27:08 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Reacting to mor comments from Sandhya. Overall very clean and nicely done PR. Thanks a lot for considering my inputs. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23860#pullrequestreview-2751503300 From kvn at openjdk.org Tue Apr 8 22:03:32 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 8 Apr 2025 22:03:32 GMT Subject: RFR: 8353174: Clean up thread register handling after 32-bit x86 removal In-Reply-To: <1u8EcLMPtsHFmCPJMQ-Hgcfql0MWy88TpSDZjr9AqrQ=.79b04f21-d97a-4af9-b83b-533ebb1acb6c@github.com> References: <1u8EcLMPtsHFmCPJMQ-Hgcfql0MWy88TpSDZjr9AqrQ=.79b04f21-d97a-4af9-b83b-533ebb1acb6c@github.com> Message-ID: On Mon, 31 Mar 2025 10:19:57 GMT, Aleksey Shipilev wrote: > Various `MacroAssembler` methods have this code to support passing the thread register, and getting it if `noreg` is passed: > > > // determine java_thread register > if (!java_thread->is_valid()) { > #ifdef _LP64 > java_thread = r15_thread; > #else > java_thread = rdi; > get_thread(java_thread); > #endif // LP64 > } > > > This never happens after 32-bit x86 removal. x86_64 always uses r15_thread. We can clean those up. > > These are also the only major users of `MacroAssembler::get_thread` that we want to remove/rename to avoid falling into traps like [JDK-8353176](https://bugs.openjdk.org/browse/JDK-8353176). > > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24323#pullrequestreview-2751506034 From vlivanov at openjdk.org Tue Apr 8 22:26:30 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 8 Apr 2025 22:26:30 GMT Subject: RFR: 8353572: x86: AMD platforms miss the check for CLWB feature flag In-Reply-To: <314QDfJV4auKGwpK0rkvupAG_iBr1icgZ6azVvWIJro=.d2f945d9-5a70-473e-b473-eb18e1153f05@github.com> References: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> <314QDfJV4auKGwpK0rkvupAG_iBr1icgZ6azVvWIJro=.d2f945d9-5a70-473e-b473-eb18e1153f05@github.com> Message-ID: On Tue, 8 Apr 2025 18:41:21 GMT, Aleksey Shipilev wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 3100: >> >>> 3098: if (ext_cpuid1_ecx.bits.sse4a != 0) >>> 3099: result |= CPU_SSE4A; >>> 3100: if (sef_cpuid7_ebx.bits.clwb != 0) >> >> I'm curious what's the rule here when it comes to vendor-specific features? >> >> From what I'm seeing in the sources, both AMD and ZX enumerate only `ext_cpuid1` features while for Intel it's a mix of `sef_cpuid7` and `ext_cpuid1`. >> >> So, I'm curious whether the code should be moved up and shared for all CPUs. > > Are you happy with this explanation, @iwanowww? Well, not really. If it were like that, then all CPU sensing logic on x86 would have been vendor-specific. But it's not the case: among many features x86 CPUs may declare, just a few are treated as vendor-specific. I took a look at how it was handled before and many extensions Intel introduced were not guarded by `is_intel()` check in the first place. And there's even more to that: though `CPU_LZCNT` and `CPU_3DNOW_PREFETCH` are handled as vendor-specific, both of them are treated uniformly across all 3 cpu families. Can those be moved into vendor-agnostic part now? Overall, I'm more comfortable with moving the check rather than duplicating it in AMD-specific block. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24385#discussion_r2034114405 From dholmes at openjdk.org Tue Apr 8 22:27:34 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 8 Apr 2025 22:27:34 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v6] In-Reply-To: <7rvz3cdG5emi43h_mqEesMgzjwl0xQGBxZDOUuoOldI=.2cdf363d-db7b-4e94-955b-9c45bbcf9845@github.com> References: <7rvz3cdG5emi43h_mqEesMgzjwl0xQGBxZDOUuoOldI=.2cdf363d-db7b-4e94-955b-9c45bbcf9845@github.com> Message-ID: On Tue, 8 Apr 2025 13:09:10 GMT, Matthias Baesken wrote: >> The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. >> SOURCE=".:git:21af8c7e7405" >> Also the MODULES list is probably useful to have. >> Add this info (or the complete content of the release file) to the hs_err files. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > print_image_release_file use load_acquire LGTM. One final suggestion but pre-approved. src/hotspot/share/runtime/os.cpp line 1576: > 1574: if (ifrc != nullptr) { > 1575: st->print_cr("%s", ifrc); > 1576: } Do we want: } else { st->print_cr(""); } or some such message? ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24244#pullrequestreview-2751536372 PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2034114309 From vlivanov at openjdk.org Tue Apr 8 22:38:25 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 8 Apr 2025 22:38:25 GMT Subject: RFR: 8353174: Clean up thread register handling after 32-bit x86 removal In-Reply-To: <1u8EcLMPtsHFmCPJMQ-Hgcfql0MWy88TpSDZjr9AqrQ=.79b04f21-d97a-4af9-b83b-533ebb1acb6c@github.com> References: <1u8EcLMPtsHFmCPJMQ-Hgcfql0MWy88TpSDZjr9AqrQ=.79b04f21-d97a-4af9-b83b-533ebb1acb6c@github.com> Message-ID: On Mon, 31 Mar 2025 10:19:57 GMT, Aleksey Shipilev wrote: > Various `MacroAssembler` methods have this code to support passing the thread register, and getting it if `noreg` is passed: > > > // determine java_thread register > if (!java_thread->is_valid()) { > #ifdef _LP64 > java_thread = r15_thread; > #else > java_thread = rdi; > get_thread(java_thread); > #endif // LP64 > } > > > This never happens after 32-bit x86 removal. x86_64 always uses r15_thread. We can clean those up. > > These are also the only major users of `MacroAssembler::get_thread` that we want to remove/rename to avoid falling into traps like [JDK-8353176](https://bugs.openjdk.org/browse/JDK-8353176). > > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24323#pullrequestreview-2751551035 From syan at openjdk.org Wed Apr 9 02:06:31 2025 From: syan at openjdk.org (SendaoYan) Date: Wed, 9 Apr 2025 02:06:31 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 [v2] In-Reply-To: References: <8py7v6gQd9nfucRK28h2enHWk21PevhMa74FYx3bRsI=.663fa24c-4218-426b-bda4-2e8d6ecb02aa@github.com> Message-ID: On Tue, 8 Apr 2025 20:37:05 GMT, Jiangli Zhou wrote: >> I have add a comment, but my English is not very well. If you have proper description, please let me known. > > Looks ok to me, thanks. Please update the contributor properly since the change is from https://github.com/openjdk/jdk/pull/24171/commits/baff6b166d130c9adeecfb9f2b418d86322d4826. Okey, that was my original plan also. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24299#discussion_r2034323699 From dholmes at openjdk.org Wed Apr 9 02:40:30 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 9 Apr 2025 02:40:30 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 [v3] In-Reply-To: <9k911IJe4DJG2PKWmXuaAY5WJYBuwFxyPfzd422V5FU=.72584eaf-bdf7-4549-9d73-808ab6e96466@github.com> References: <9k911IJe4DJG2PKWmXuaAY5WJYBuwFxyPfzd422V5FU=.72584eaf-bdf7-4549-9d73-808ab6e96466@github.com> Message-ID: On Tue, 8 Apr 2025 03:55:03 GMT, SendaoYan wrote: >> Hi all, >> >> This PR will try to fix memory leak after JDK-8352184. which re-do [JDK-8352184](https://bugs.openjdk.org/browse/JDK-8352184) using the original, purely static uses of the various description strings, as @jianglizhou had proposed. >> >> Additional testing: >> >> - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64 >> - [x] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 >> - [x] full `java -version` tests, the test shell script show as below. >> >> [JDK-8353189.sh.txt](https://github.com/user-attachments/files/19643535/JDK-8353189.sh.txt) > > SendaoYan has updated the pull request incrementally with one additional commit since the last revision: > > add a comment to explain why we avoid dynamic memory allocation for the vm_info_string This looks good to me. Sorry for the delay in getting it reviewed. Thanks for fixing this. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24299#pullrequestreview-2751895602 From fyang at openjdk.org Wed Apr 9 04:02:29 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 9 Apr 2025 04:02:29 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user [v2] In-Reply-To: <1pa1FDH5Z2quR3fE7o4qfZKwRrz8nXHbMSirSyiqhTw=.9c37d2a9-5b93-40dd-8b5a-a5822030ef48@github.com> References: <1pa1FDH5Z2quR3fE7o4qfZKwRrz8nXHbMSirSyiqhTw=.9c37d2a9-5b93-40dd-8b5a-a5822030ef48@github.com> Message-ID: <9R7U8cL4aSOayHQzaXoTGx0nXSXqdkO4ZomONZnM0Ao=.c1c90e1c-3085-4656-911d-23c407cff74d@github.com> On Fri, 28 Mar 2025 06:53:15 GMT, Robbin Ehn wrote: > It's not some intermittently failure. The majority of them can't work as they use pstack, open core files, use PerfData, etc.. and expected it to be rv64. But core files, pstack are in host arch as we are running qemu-user. I can remove tests which timeouts and only keep test which simply can't work in qemu-user environment in this PR. Seems good? Hi, That make sense to me. And it doesn't seem to me to be riscv-specific issue, but rather one with qemu-user. Maybe we should update the title and changes to reflect that? I sometimes see people testing with qemu for other CPU platforms as well like ppc, s390, etc. Guess they might be helped with this too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24229#issuecomment-2788216103 From stuefe at openjdk.org Wed Apr 9 05:12:04 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 9 Apr 2025 05:12:04 GMT Subject: Integrated: 8353273: Reduce number of oop map entries in instances In-Reply-To: References: Message-ID: <0yHDLc0RYvkE6t_MCwBJFg9A_pdZTUk88MSR3VDwTZ8=.cce7920f-8218-4421-a0c4-fb96bf2b7e8f@github.com> On Mon, 31 Mar 2025 14:02:03 GMT, Thomas Stuefe wrote: > In preparation for planned GC performance improvements (KLUT), I would like to reduce the average number of oop map entries. > > For details, please see JBS issue text. > > ----------------------- > > Patch results: > > The patch brings a positive change of oop map size, reducing the likelihood of lengthy oop maps. Here the oop map size distribution over all JDK classes in the JDK image: > > Before: > > 5395 - non-static oop maps (0 entries) > 9330 - non-static oop maps (1 entries) > 1449 - non-static oop maps (2 entries) > 274 - non-static oop maps (3 entries) > 218 - non-static oop maps (4 entries) > 75 - non-static oop maps (5 entries) > 7 - non-static oop maps (6 entries) > 4 - non-static oop maps (7 entries) > > > Now: > > 5395 - non-static oop maps (0 entries) > 10178 - non-static oop maps (1 entries) > 933 - non-static oop maps (2 entries) > 229 - non-static oop maps (3 entries) > 16 - non-static oop maps (4 entries) > 1 - non-static oop maps (5 entries) > > > For example, `java.util.concurrent.ConcurrentHashMap$TreeNode` is changed from having 2 entries to having just one entry, which is nice for a class that may be instantiated a lot: > > Before: > > java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000000d1dddc0} > - ---- non-static fields (9 words): > - final 'hash' 'I' @12 > - final 'key' 'Ljava/lang/Object;' @16 > - volatile 'val' 'Ljava/lang/Object;' @20 > - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class > - 'red' 'Z' @28 << derived class starts here, non-oops lead > - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 > - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 > - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 > - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @44 > - non-static oop maps (2 entries): 16-24 32-44 > > Now: > > java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000007e1de450} > - ---- non-static fields (9 words): > - final 'hash' 'I' @12 > - final 'key' 'Ljava/lang/Object;' @16 > - volatile 'val' 'Ljava/lang/Object;' @20 > - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class > - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @28 << class starts here, oops lead > - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 > - 'right' 'Ljava/util/concurrent/Concurre... This pull request has now been integrated. Changeset: 743d1c64 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/743d1c64c29118b15897b3c676919353ced467f5 Stats: 298 lines in 4 files changed: 283 ins; 1 del; 14 mod 8353273: Reduce number of oop map entries in instances Reviewed-by: lmesnik, fparain, jsjolen ------------- PR: https://git.openjdk.org/jdk/pull/24330 From stuefe at openjdk.org Wed Apr 9 05:12:03 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 9 Apr 2025 05:12:03 GMT Subject: RFR: 8353273: Reduce number of oop map entries in instances [v3] In-Reply-To: <8oHlvz98KPGFmzSttjpmKbjNSQkbF0rmLlE9Aqdgs1M=.4b96d172-54c6-4005-bcd2-1a5ef76198a0@github.com> References: <8oHlvz98KPGFmzSttjpmKbjNSQkbF0rmLlE9Aqdgs1M=.4b96d172-54c6-4005-bcd2-1a5ef76198a0@github.com> Message-ID: On Tue, 8 Apr 2025 05:46:09 GMT, Thomas Stuefe wrote: >> In preparation for planned GC performance improvements (KLUT), I would like to reduce the average number of oop map entries. >> >> For details, please see JBS issue text. >> >> ----------------------- >> >> Patch results: >> >> The patch brings a positive change of oop map size, reducing the likelihood of lengthy oop maps. Here the oop map size distribution over all JDK classes in the JDK image: >> >> Before: >> >> 5395 - non-static oop maps (0 entries) >> 9330 - non-static oop maps (1 entries) >> 1449 - non-static oop maps (2 entries) >> 274 - non-static oop maps (3 entries) >> 218 - non-static oop maps (4 entries) >> 75 - non-static oop maps (5 entries) >> 7 - non-static oop maps (6 entries) >> 4 - non-static oop maps (7 entries) >> >> >> Now: >> >> 5395 - non-static oop maps (0 entries) >> 10178 - non-static oop maps (1 entries) >> 933 - non-static oop maps (2 entries) >> 229 - non-static oop maps (3 entries) >> 16 - non-static oop maps (4 entries) >> 1 - non-static oop maps (5 entries) >> >> >> For example, `java.util.concurrent.ConcurrentHashMap$TreeNode` is changed from having 2 entries to having just one entry, which is nice for a class that may be instantiated a lot: >> >> Before: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000000d1dddc0} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'red' 'Z' @28 << derived class starts here, non-oops lead >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @32 >> - 'left' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @36 >> - 'right' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @40 >> - 'prev' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @44 >> - non-static oop maps (2 entries): 16-24 32-44 >> >> Now: >> >> java.util.concurrent.ConcurrentHashMap$TreeNode {0x000000007e1de450} >> - ---- non-static fields (9 words): >> - final 'hash' 'I' @12 >> - final 'key' 'Ljava/lang/Object;' @16 >> - volatile 'val' 'Ljava/lang/Object;' @20 >> - volatile 'next' 'Ljava/util/concurrent/ConcurrentHashMap$Node;' @24 << last field of base class >> - 'parent' 'Ljava/util/concurrent/ConcurrentHashMap$TreeNode;' @28 << class starts here, oop... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - test fixes > - Merge branch 'master' into JDK-8353273-Reduce-average-number-of-oop-map-entries-in-instance-objects > - add regression test > - Reworked to use prior super klass layout reconstruction pass > - Merge branch 'master' into JDK-8353273-Reduce-average-number-of-oop-map-entries-in-instance-objects > - alternate-order > - print Thank you all. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24330#issuecomment-2788294969 From thartmann at openjdk.org Wed Apr 9 05:12:37 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 9 Apr 2025 05:12:37 GMT Subject: RFR: 8353593: MethodData "mileage_*" methods and fields aren't used and can be removed In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 00:52:30 GMT, Cesar Soares Lucas wrote: > Please review this trivial patch to remove dead code from MethodData class. > Tested on Linux x86_64 with JTREG_TIER1. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24399#pullrequestreview-2752057606 From cslucas at openjdk.org Wed Apr 9 05:12:37 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 9 Apr 2025 05:12:37 GMT Subject: Integrated: 8353593: MethodData "mileage_*" methods and fields aren't used and can be removed In-Reply-To: References: Message-ID: <9XVVDGHX29hTRJBC4M2SRu_Uq4dxSH8zo2JOFso9yyM=.438fb058-c9f3-4612-8ab9-4afdd65f22d9@github.com> On Thu, 3 Apr 2025 00:52:30 GMT, Cesar Soares Lucas wrote: > Please review this trivial patch to remove dead code from MethodData class. > Tested on Linux x86_64 with JTREG_TIER1. This pull request has now been integrated. Changeset: 473251db Author: Cesar Soares Lucas Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/473251dbb308016ccda6c88fd36bd10c81e65865 Stats: 15 lines in 2 files changed: 0 ins; 13 del; 2 mod 8353593: MethodData "mileage_*" methods and fields aren't used and can be removed Reviewed-by: phh, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24399 From kbarrett at openjdk.org Wed Apr 9 06:22:20 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 9 Apr 2025 06:22:20 GMT Subject: RFR: 8324686: Remove redefinition of NULL for MSVC Message-ID: Please review this change that removes the redefinition of NULL in globalDefinitions_visCPP.hpp. That redefinition was to support the use of NULL in a varargs context, because of the size difference for int vs a pointer. However, we no longer have any direct uses of NULL in HotSpot, and have a test that ensures there is no backsliding. There may be indirect uses of NULL via third-party libraries. Such uses could have been in the scope of the removed redefinition. But those uses must have been correct even without the redefinition, else they would be incorrect for non-HotSpot users. Testing: mach5 tier1-3, GHA sanity tests ------------- Commit messages: - remove VS redef of NULL Changes: https://git.openjdk.org/jdk/pull/24537/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24537&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324686 Stats: 21 lines in 2 files changed: 0 ins; 20 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24537.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24537/head:pull/24537 PR: https://git.openjdk.org/jdk/pull/24537 From mbaesken at openjdk.org Wed Apr 9 06:30:29 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 9 Apr 2025 06:30:29 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v7] In-Reply-To: References: Message-ID: > The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. > SOURCE=".:git:21af8c7e7405" > Also the MODULES list is probably useful to have. > Add this info (or the complete content of the release file) to the hs_err files. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: fix assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24244/files - new: https://git.openjdk.org/jdk/pull/24244/files/ec6b7f19..9bbd6933 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24244&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24244&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24244/head:pull/24244 PR: https://git.openjdk.org/jdk/pull/24244 From rehn at openjdk.org Wed Apr 9 06:34:30 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 9 Apr 2025 06:34:30 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user [v2] In-Reply-To: <9R7U8cL4aSOayHQzaXoTGx0nXSXqdkO4ZomONZnM0Ao=.c1c90e1c-3085-4656-911d-23c407cff74d@github.com> References: <1pa1FDH5Z2quR3fE7o4qfZKwRrz8nXHbMSirSyiqhTw=.9c37d2a9-5b93-40dd-8b5a-a5822030ef48@github.com> <9R7U8cL4aSOayHQzaXoTGx0nXSXqdkO4ZomONZnM0Ao=.c1c90e1c-3085-4656-911d-23c407cff74d@github.com> Message-ID: On Wed, 9 Apr 2025 03:57:10 GMT, Fei Yang wrote: > > It's not some intermittently failure. The majority of them can't work as they use pstack, open core files, use PerfData, etc.. and expected it to be rv64. But core files, pstack are in host arch as we are running qemu-user. I can remove tests which timeouts and only keep test which simply can't work in qemu-user environment in this PR. Seems good? > > Hi, That make sense to me. And it doesn't seem to me to be riscv-specific issue, but rather one with qemu-user. Maybe we should update the title and changes to reflect that? I sometimes see people testing with qemu for other CPU platforms as well like ppc, s390, etc. Guess they might be helped with this too. Hey, thanks for considering. The default qemu /proc/cpu do not contain any information about this being qemu. And there is no standard way to find this out AFIAK. Some platforms have target specific /proc/cpu and put qemu in there, but it have no standard format. The whole proc -> uarch string -> jvm cpu string -> jtreg require is qemu/linux-user/riscv specific. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24229#issuecomment-2788434073 From fyang at openjdk.org Wed Apr 9 06:57:42 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 9 Apr 2025 06:57:42 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 10:29:38 GMT, Hamlin Li wrote: > > Maybe we should check UseZicond and only enable UseCMoveUnconditionally & ConditionalMoveLimit conditionally? > > Not sure what do you mean here. Sorry for not being clear enough. I am suggesting this: if (UseZicond) { FLAG_SET_DEFAULT(ConditionalMoveLimit, 3); } Without `Zicond` extension, conditional moves composed by C2 are simply emulated with regular conditional branches on riscv, which I think is not good in respect of performance. >> src/hotspot/cpu/riscv/vm_version_riscv.cpp line 461: >> >>> 459: FLAG_SET_DEFAULT(UseZicond, false); >>> 460: warning("UseZicond is turned off automatically. Turn it on with -XX:+UseZicond explicitly."); >>> 461: } >> >> Does this mean `UseZicond` could not be enabled on the command line? And I witnessed quite some warning when doing a native build. If `UseZicond` causes regression for some cases, is it more reasonable to not auto-enable it through hwprobe [1]? Or only enable it for debug builds like https://github.com/openjdk/jdk/pull/24478 does? >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp#L228 > > This is to not enable Zicond automatically, but user can still turn it on manually if they want to try or make sure it bring benefit on the specific hardware. > Currently it's based on bananapi result, so maybe in the future we should adjust the default value of UseZicond. > I'm fine with either default value. I just witnessed a couple of warnings (`UseZicond is turned off automatically. Turn it on with -XX:+UseZicond explicitly.`) when doing a native build on my P550 SBC which is not equipped with `Zicond` extension. I don't think that is expected? And I agree that it might be better to keep this option disabled by default and let users decide whether to enable it based on their cases. But what I see is that `UseZicond` will be auto-enabled through hwprobe [1] on my BPI-F3. So I am suggesting to not to do that in my previous comment. Make sense? [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp#L228 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24490#discussion_r2034572731 PR Review Comment: https://git.openjdk.org/jdk/pull/24490#discussion_r2034595744 From mbaesken at openjdk.org Wed Apr 9 07:02:12 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 9 Apr 2025 07:02:12 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v8] In-Reply-To: References: Message-ID: > The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. > SOURCE=".:git:21af8c7e7405" > Also the MODULES list is probably useful to have. > Add this info (or the complete content of the release file) to the hs_err files. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: print some output in case release file has not been read ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24244/files - new: https://git.openjdk.org/jdk/pull/24244/files/9bbd6933..0b6c6142 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24244&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24244&range=06-07 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24244/head:pull/24244 PR: https://git.openjdk.org/jdk/pull/24244 From mbaesken at openjdk.org Wed Apr 9 07:02:13 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 9 Apr 2025 07:02:13 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v6] In-Reply-To: References: <7rvz3cdG5emi43h_mqEesMgzjwl0xQGBxZDOUuoOldI=.2cdf363d-db7b-4e94-955b-9c45bbcf9845@github.com> Message-ID: On Tue, 8 Apr 2025 22:23:38 GMT, David Holmes wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> print_image_release_file use load_acquire > > src/hotspot/share/runtime/os.cpp line 1576: > >> 1574: if (ifrc != nullptr) { >> 1575: st->print_cr("%s", ifrc); >> 1576: } > > Do we want: > > } else { > st->print_cr(""); > } > > or some such message? Sure, why not ! I added the suggested output. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24244#discussion_r2034611566 From shade at openjdk.org Wed Apr 9 07:31:44 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 07:31:44 GMT Subject: RFR: 8353174: Clean up thread register handling after 32-bit x86 removal In-Reply-To: <1u8EcLMPtsHFmCPJMQ-Hgcfql0MWy88TpSDZjr9AqrQ=.79b04f21-d97a-4af9-b83b-533ebb1acb6c@github.com> References: <1u8EcLMPtsHFmCPJMQ-Hgcfql0MWy88TpSDZjr9AqrQ=.79b04f21-d97a-4af9-b83b-533ebb1acb6c@github.com> Message-ID: On Mon, 31 Mar 2025 10:19:57 GMT, Aleksey Shipilev wrote: > Various `MacroAssembler` methods have this code to support passing the thread register, and getting it if `noreg` is passed: > > > // determine java_thread register > if (!java_thread->is_valid()) { > #ifdef _LP64 > java_thread = r15_thread; > #else > java_thread = rdi; > get_thread(java_thread); > #endif // LP64 > } > > > This never happens after 32-bit x86 removal. x86_64 always uses r15_thread. We can clean those up. > > These are also the only major users of `MacroAssembler::get_thread` that we want to remove/rename to avoid falling into traps like [JDK-8353176](https://bugs.openjdk.org/browse/JDK-8353176). > > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` Thanks all! I have re-merged with master locally, and there are no evident problems. So I am integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24323#issuecomment-2788624154 From shade at openjdk.org Wed Apr 9 07:31:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 07:31:45 GMT Subject: Integrated: 8353174: Clean up thread register handling after 32-bit x86 removal In-Reply-To: <1u8EcLMPtsHFmCPJMQ-Hgcfql0MWy88TpSDZjr9AqrQ=.79b04f21-d97a-4af9-b83b-533ebb1acb6c@github.com> References: <1u8EcLMPtsHFmCPJMQ-Hgcfql0MWy88TpSDZjr9AqrQ=.79b04f21-d97a-4af9-b83b-533ebb1acb6c@github.com> Message-ID: On Mon, 31 Mar 2025 10:19:57 GMT, Aleksey Shipilev wrote: > Various `MacroAssembler` methods have this code to support passing the thread register, and getting it if `noreg` is passed: > > > // determine java_thread register > if (!java_thread->is_valid()) { > #ifdef _LP64 > java_thread = r15_thread; > #else > java_thread = rdi; > get_thread(java_thread); > #endif // LP64 > } > > > This never happens after 32-bit x86 removal. x86_64 always uses r15_thread. We can clean those up. > > These are also the only major users of `MacroAssembler::get_thread` that we want to remove/rename to avoid falling into traps like [JDK-8353176](https://bugs.openjdk.org/browse/JDK-8353176). > > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: 6df34c36 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/6df34c361e0d1b6fe90ca97c1aaa56e57a86d12c Stats: 233 lines in 15 files changed: 16 ins; 98 del; 119 mod 8353174: Clean up thread register handling after 32-bit x86 removal Reviewed-by: cslucas, kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/24323 From fyang at openjdk.org Wed Apr 9 07:35:40 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 9 Apr 2025 07:35:40 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user [v2] In-Reply-To: References: <1pa1FDH5Z2quR3fE7o4qfZKwRrz8nXHbMSirSyiqhTw=.9c37d2a9-5b93-40dd-8b5a-a5822030ef48@github.com> <9R7U8cL4aSOayHQzaXoTGx0nXSXqdkO4ZomONZnM0Ao=.c1c90e1c-3085-4656-911d-23c407cff74d@github.com> Message-ID: On Wed, 9 Apr 2025 06:31:55 GMT, Robbin Ehn wrote: > > > It's not some intermittently failure. The majority of them can't work as they use pstack, open core files, use PerfData, etc.. and expected it to be rv64. But core files, pstack are in host arch as we are running qemu-user. I can remove tests which timeouts and only keep test which simply can't work in qemu-user environment in this PR. Seems good? > > > > > > Hi, That make sense to me. And it doesn't seem to me to be riscv-specific issue, but rather one with qemu-user. Maybe we should update the title and changes to reflect that? I sometimes see people testing with qemu for other CPU platforms as well like ppc, s390, etc. Guess they might be helped with this too. > > Hey, thanks for considering. The default qemu /proc/cpu do not contain any information about this being qemu. And there is no standard way to find this out AFIAK. Some platforms have target specific /proc/cpu and put qemu in there, but it have no standard format. The whole proc -> uarch string -> jvm cpu string -> jtreg require is qemu/linux-user/riscv specific. Ah, I see. But I guess it won't bite us if we can't parse `qemu` in /proc/cpuinfo? I am not familiar with how qemu-user works. Can I expect this to work at least for some other CPUs supported by the JVM? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24229#issuecomment-2788626578 From shade at openjdk.org Wed Apr 9 07:55:38 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 07:55:38 GMT Subject: RFR: 8324686: Remove redefinition of NULL for MSVC In-Reply-To: References: Message-ID: <5dJylnk5OQrxHxiPNmj722EwNs4OlzuW3GX2YLD_ThA=.94dd40d3-0181-45a0-a675-0ef6f3c61f33@github.com> On Wed, 9 Apr 2025 06:16:18 GMT, Kim Barrett wrote: > Please review this change that removes the redefinition of NULL in > globalDefinitions_visCPP.hpp. That redefinition was to support the use of NULL > in a varargs context, because of the size difference for int vs a pointer. > However, we no longer have any direct uses of NULL in HotSpot, and have a test > that ensures there is no backsliding. > > There may be indirect uses of NULL via third-party libraries. Such uses could > have been in the scope of the removed redefinition. But those uses must have > been correct even without the redefinition, else they would be incorrect for > non-HotSpot users. > > Testing: mach5 tier1-3, GHA sanity tests Looks fine. So, just to be extra clear, this would only affect Hotspot, not JDK. There are no interesting hits for `NULL`-s right now in Hotspot code. There are still lots of `NULL`-s in JDK native code. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24537#pullrequestreview-2752459435 From rehn at openjdk.org Wed Apr 9 08:09:42 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 9 Apr 2025 08:09:42 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user [v2] In-Reply-To: References: <1pa1FDH5Z2quR3fE7o4qfZKwRrz8nXHbMSirSyiqhTw=.9c37d2a9-5b93-40dd-8b5a-a5822030ef48@github.com> <9R7U8cL4aSOayHQzaXoTGx0nXSXqdkO4ZomONZnM0Ao=.c1c90e1c-3085-4656-911d-23c407cff74d@github.com> Message-ID: On Wed, 9 Apr 2025 07:29:01 GMT, Fei Yang wrote: > > > > It's not some intermittently failure. The majority of them can't work as they use pstack, open core files, use PerfData, etc.. and expected it to be rv64. But core files, pstack are in host arch as we are running qemu-user. I can remove tests which timeouts and only keep test which simply can't work in qemu-user environment in this PR. Seems good? > > > > > > > > > Hi, That make sense to me. And it doesn't seem to me to be riscv-specific issue, but rather one with qemu-user. Maybe we should update the title and changes to reflect that? I sometimes see people testing with qemu for other CPU platforms as well like ppc, s390, etc. Guess they might be helped with this too. > > > > > > Hey, thanks for considering. The default qemu /proc/cpu do not contain any information about this being qemu. And there is no standard way to find this out AFIAK. Some platforms have target specific /proc/cpu and put qemu in there, but it have no standard format. The whole proc -> uarch string -> jvm cpu string -> jtreg require is qemu/linux-user/riscv specific. > > Ah, I see. But I guess it won't bite us if we can't parse `qemu` in /proc/cpuinfo? I am not familiar with how qemu-user works. Can I expect this to work at least for some other CPUs supported by the JVM? qemu-user, "uarch: qemu" in cpuinfo: `[0.084s][info ][os,cpu] CPU: total 28 (initial active 28) qemu rv64 rvi rvm rva rvf rvd rvc rvv zba zbb zbs zfh zfhmin zvbc zvfh zicond` Hence we know this is qemu-user (only qemu-user sets uarch to qemu on riscv). `/proc/cpuinfo` do not contain uarch: [0.053s][info ][os,cpu] CPU: total 8 (initial active 8) rv64 rvi rvm rva rvf rvd rvc zba zbb zbs zfh zfhmin zvfh zicond We have no clue if this is emulated or on real hardware, tests will be executed. Tests are only excluded if we know it's qemu-user. Did that anwser your Q ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24229#issuecomment-2788717608 From jsjolen at openjdk.org Wed Apr 9 08:11:34 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 9 Apr 2025 08:11:34 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 12:02:58 GMT, Radim Vansa wrote: >> On the reproducer https://bugs.openjdk.org/secure/attachment/113985/CCC.java my local testing shows these numbers: >> >> ### JDK-17 >> >> $ hyperfine -w 5 -r 10 '/path/to/jdk-17/bin/java -cp /tmp/ CCC' >> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp/ CCC >> Time (mean ? ?): 32.5 ms ? 0.9 ms [User: 27.5 ms, System: 10.6 ms] >> Range (min ? max): 31.1 ms ? 33.7 ms 10 runs >> >> ### JDK-25 before the change applied >> >> $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' >> Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC >> Time (mean ? ?): 101.6 ms ? 1.5 ms [User: 96.8 ms, System: 14.6 ms] >> Range (min ? max): 99.0 ms ? 104.5 ms 10 runs >> >> ### JDK-25 with this patch >> >> $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' >> Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC >> Time (mean ? ?): 75.8 ms ? 1.2 ms [User: 69.8 ms, System: 16.0 ms] >> Range (min ? max): 73.8 ms ? 78.2 ms 10 runs > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Fix compilation error in assertion Hi, The idea behind the change is good, but I think that we can clean up the code. See the comments for how that can be done. src/hotspot/share/oops/instanceKlass.cpp line 1940: > 1938: // In DebugInfo nonstatic fields are sorted by offset. > 1939: GrowableArray > fields_sorted; > 1940: int i = 0; Would you mind also cleaning up this usage of `i`? Seems like it can be removed and `fields_sorted.length()` can be used instead. src/hotspot/share/oops/instanceKlass.cpp line 1944: > 1942: if (!fs.access_flags().is_static()) { > 1943: fd = fs.field_descriptor(); > 1944: Tuple f(fs.offset(), fs.index(), fs.to_FieldInfo()); `FieldInfo` contains the `offset`, so that's not necessary. Besides, the `index` is now only used for an `assert` in the `reinitialize` code. Why not get rid of the index argument and the assert? In other words: Get rid of the `Tuple` changes and have `fields_sorted` be a `GrowableArray` of `FiedlInfo` only. ------------- PR Review: https://git.openjdk.org/jdk/pull/24290#pullrequestreview-2752454462 PR Review Comment: https://git.openjdk.org/jdk/pull/24290#discussion_r2034713775 PR Review Comment: https://git.openjdk.org/jdk/pull/24290#discussion_r2034738604 From kbarrett at openjdk.org Wed Apr 9 08:17:37 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 9 Apr 2025 08:17:37 GMT Subject: RFR: 8324686: Remove redefinition of NULL for MSVC In-Reply-To: <5dJylnk5OQrxHxiPNmj722EwNs4OlzuW3GX2YLD_ThA=.94dd40d3-0181-45a0-a675-0ef6f3c61f33@github.com> References: <5dJylnk5OQrxHxiPNmj722EwNs4OlzuW3GX2YLD_ThA=.94dd40d3-0181-45a0-a675-0ef6f3c61f33@github.com> Message-ID: On Wed, 9 Apr 2025 07:52:44 GMT, Aleksey Shipilev wrote: > Looks fine. So, just to be extra clear, this would only affect Hotspot, not JDK. There are no interesting hits for `NULL`-s right now in Hotspot code. There are still lots of `NULL`-s in JDK native code. That's correct. And there's a test to keep it that way: https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/sources/TestNoNULL.java ------------- PR Comment: https://git.openjdk.org/jdk/pull/24537#issuecomment-2788740007 From sspitsyn at openjdk.org Wed Apr 9 08:20:34 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 9 Apr 2025 08:20:34 GMT Subject: RFR: 8352773: JVMTI should disable events during java upcalls Message-ID: As noted in [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088), JVMTI `GetThreadGroupChildren` does an upcall to java. This results in a`ClassPrepare` event the first time it does this, and these events can cause problems (deadlocks) for the debugger or debug agent. The [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088) was fixed to get rid of class loading during Java upcall from `GetThreadGroupChildren`. However, some other events can be generated as well. It is more safe to disable all JVMTI events during debugger-related upcalls originated by JVMTI. The `ClassPrepare` events are important for the debug agent. So, an assert was added into `ClassPrepare` event generation to make sure there are no attempts to post this event during upcalls. Some specific implementation details can be added to the first PR comment. Testing: - Verified with the test `jdk/com/sun/jdi/EarlyThreadGroupChildrenTest.java` that was added with the fix of [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088): - the assert described above is fired if the fix of JDK-8352088 is removed - the test is passed without if the fix of JDK-8352088 is removed and the assert is removed - Ran mach5 tiers 1-6 ------------- Commit messages: - 8352773: JVMTI should disable events during java upcalls Changes: https://git.openjdk.org/jdk/pull/24539/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24539&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352773 Stats: 32 lines in 6 files changed: 30 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24539/head:pull/24539 PR: https://git.openjdk.org/jdk/pull/24539 From shade at openjdk.org Wed Apr 9 08:31:44 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 08:31:44 GMT Subject: RFR: 8351157: Clean up x86 GC barriers after 32-bit x86 removal [v3] In-Reply-To: References: Message-ID: > Assembler GC barriers have quite a bit of coding to support 32-bit x86. As 32-bit x86 is removed, we can clean up those parts. > > We can eliminate `!LP64` blocks quite easily. We can also prune passing around `thread` argument, and just trust that `r15_thread` is always available. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into JDK-8351157-x86-gc-barriers - Merge branch 'master' into JDK-8351157-x86-gc-barriers - Also do tlab_allocate - Rely on R15 to be a thread register - Work ------------- Changes: https://git.openjdk.org/jdk/pull/24253/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24253&range=02 Stats: 543 lines in 20 files changed: 1 ins; 426 del; 116 mod Patch: https://git.openjdk.org/jdk/pull/24253.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24253/head:pull/24253 PR: https://git.openjdk.org/jdk/pull/24253 From shade at openjdk.org Wed Apr 9 08:31:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 08:31:45 GMT Subject: RFR: 8351157: Clean up x86 GC barriers after 32-bit x86 removal [v2] In-Reply-To: <6aXRsWRRGrrJdkmNcZHPw8JBD5piGr6UrmjOdnHjlMY=.3dde2c28-bdfc-4eb1-8d1d-7a4c85d3234f@github.com> References: <6aXRsWRRGrrJdkmNcZHPw8JBD5piGr6UrmjOdnHjlMY=.3dde2c28-bdfc-4eb1-8d1d-7a4c85d3234f@github.com> Message-ID: <44v659C-wJrB9RUCDMFObzkSMYE6zEdyx5oRzK7axHI=.71443b71-c44c-4661-8ec3-f34f15d2ffe2@github.com> On Thu, 27 Mar 2025 12:31:21 GMT, Aleksey Shipilev wrote: >> Assembler GC barriers have quite a bit of coding to support 32-bit x86. As 32-bit x86 is removed, we can clean up those parts. >> >> We can eliminate `!LP64` blocks quite easily. We can also prune passing around `thread` argument, and just trust that `r15_thread` is always available. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into JDK-8351157-x86-gc-barriers > - Also do tlab_allocate > - Rely on R15 to be a thread register > - Work Friendly reminder :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24253#issuecomment-2788776594 From alanb at openjdk.org Wed Apr 9 08:41:24 2025 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 9 Apr 2025 08:41:24 GMT Subject: RFR: 8352773: JVMTI should disable events during java upcalls In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:14:04 GMT, Serguei Spitsyn wrote: > As noted in [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088), JVMTI `GetThreadGroupChildren` does an upcall to java. This results in a`ClassPrepare` event the first time it does this, and these events can cause problems (deadlocks) for the debugger or debug agent. The [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088) was fixed to get rid of class loading during Java upcall from `GetThreadGroupChildren`. However, some other events can be generated as well. It is more safe to disable all JVMTI events during debugger-related upcalls originated by JVMTI. > The `ClassPrepare` events are important for the debug agent. So, an assert was added into `ClassPrepare` event generation to make sure there are no attempts to post this event during upcalls. > Some specific implementation details can be added to the first PR comment. > > Testing: > - Verified with the test `jdk/com/sun/jdi/EarlyThreadGroupChildrenTest.java` that was added with the fix of [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088): > - the assert described above is fired if the fix of JDK-8352088 is removed > - the test is passed without if the fix of JDK-8352088 is removed and the assert is removed > - Ran mach5 tiers 1-6 If this goes ahead then it allows for a discussion about changing JVMTI InterruptThread to invoke Thread.interrupt when the target is a platform thread. As you know, there is a long standing issue here where threads blocked on interruptible channels not being awakened by JVMTI InterruptThread. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24539#issuecomment-2788806930 From mli at openjdk.org Wed Apr 9 08:51:36 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 9 Apr 2025 08:51:36 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 06:41:51 GMT, Fei Yang wrote: > Sorry for not being clear enough. I am suggesting this: if (UseZicond) { FLAG_SET_DEFAULT(ConditionalMoveLimit, 3); } I think this depends on whether we should enable ConditionalMoveLimit based on `UseZicond`? So, I'll leave it until we have a dicision about the following discussion. > Without Zicond extension, conditional moves composed by C2 are simply emulated with regular conditional branches on riscv, which I think is not good in respect of performance. Yes, when Zicond is not supported (or turned off), C2 use the alternative path which is `branch + mv`. When Zicond was introduced (https://github.com/openjdk/jdk/pull/22386), the rational behind it is that brach bring regression, but I think it's based on the fact that code size is the same. But in cmove case, the size is increased, and in particular when C2 unrolls a loop, it can increase the code size a lot, which is not good for cache. And the jmh test result shows that brach version is better than patch with Zicond and master with/wo Zicond in most of test cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24490#discussion_r2034819606 From mli at openjdk.org Wed Apr 9 09:04:41 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 9 Apr 2025 09:04:41 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > On riscv, CMoveI/L already were implemented, but there are some gap: > 1. CMoveI/L does not support comparison with float/double, corresponding tests are not turn on either. > 2. Some optimization of C2 is not turned on, e.g. `Phi -> CMove -> min_max`. > 3. lack of some corresponding performance tests. > > Also there are some issue with current Zicond: > 1. UseZicond is turned on automatically by hwprobe, but jmh tests show that it's not always bring benefit, in some situation it even brings regression, the reason is the generated code by Zicond is much larger than branch version, in particular when it's in a loop and unrolled. > > This patch on riscv is to: > 1. add CMoveI/L comparing float/double, and corresponding tests, > 2. enable more C2 optimization, > 3. add more benchmark tests, > 4. turn off UseZicond by default. > > Thanks! > > ## Performance > > ### MinMax > test data > > Benchmark | Mode | Cnt | Score - master | Score - master+UseZbb | Score - -master+UseZicond | Score - master+UseZicond+UseZbb | Score - cmovei | Score - cmovei+UseZbb | Score - cmovei+UseZicond | Score - cmovei+UseZicond+UseZbb | Error | Units | Opt (master/cmovei) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.vm.compiler.IfMinMax.testReductionInt | avgt | 40 | 17152.075 | 17216.592 | 17272.493 | 17296.89 | 17127.844 | 17036.605 | 17299.333 | 17250.566 | 73.179 | ns/op | 1.001 > o.o.b.vm.compiler.IfMinMax.testReductionLong | avgt | 40 | 19770.828 | 19967.578 | 20268.905 | 20166.165 | 20065.552 | 20059.095 | 20161.914 | 20151.295 | 131.428 | ns/op | 0.985 > o.o.b.vm.compiler.IfMinMax.testSingleInt | avgt | 40 | 114.734 | 114.402 | 114.887 | 114.384 | 114.4 | 110.631 | 112.162 | 110.915 | 0.333 | ns/op | 1.003 > o.o.b.vm.compiler.IfMinMax.testSingleLong | avgt | 40 | 121.53 | 121.711 | 120.91 | 121.665 | 121.309 | 120.57 | 118.639 | 119.373 | 0.451 | ns/op | 1.002 > o.o.b.vm.compiler.IfMinMax.testVectorInt | avgt | 40 | 60130.165 | 60062.303 | 61839.776 | 61895.194 | 15887.398 | 15924.502 | 15874.835 | 15667.936 | 101.94 | ns/op | 3.785 > o.o.b.vm.compiler.IfMinMax.testVectorLong | avgt | 40 | 63855.379 | 6309... Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - typo - Merge branch 'master' into cmoveil-v1 - turn off flag Zicond by default - remove - initial commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24490/files - new: https://git.openjdk.org/jdk/pull/24490/files/0f013e7c..10f9adb0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24490&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24490&range=00-01 Stats: 30103 lines in 808 files changed: 20231 ins; 7672 del; 2200 mod Patch: https://git.openjdk.org/jdk/pull/24490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24490/head:pull/24490 PR: https://git.openjdk.org/jdk/pull/24490 From mli at openjdk.org Wed Apr 9 09:04:42 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 9 Apr 2025 09:04:42 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L [v2] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 15:02:27 GMT, Feilong Jiang wrote: >> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - typo >> - Merge branch 'master' into cmoveil-v1 >> - turn off flag Zicond by default >> - remove >> - initial commit > > src/hotspot/cpu/riscv/riscv.ad line 9979: > >> 9977: >> 9978: format %{ >> 9979: "CMove $dst, ($op1 $cop $op2), $dst, $src\t#@cmovI_cmpF\n\t" > > Should be `CMoveI` too? Yes, fixed. Thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24490#discussion_r2034842102 From shade at openjdk.org Wed Apr 9 09:34:27 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 09:34:27 GMT Subject: RFR: 8353572: x86: AMD platforms miss the check for CLWB feature flag In-Reply-To: References: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> <314QDfJV4auKGwpK0rkvupAG_iBr1icgZ6azVvWIJro=.d2f945d9-5a70-473e-b473-eb18e1153f05@github.com> Message-ID: On Tue, 8 Apr 2025 22:23:47 GMT, Vladimir Ivanov wrote: >> Are you happy with this explanation, @iwanowww? > > Well, not really. If it were like that, then all CPU sensing logic on x86 would have been vendor-specific. But it's not the case: among many features x86 CPUs may declare, just a few are treated as vendor-specific. > > I took a look at how it was handled before and many extensions Intel introduced were not guarded by `is_intel()` check in the first place. > > And there's even more to that: though `CPU_LZCNT` and `CPU_3DNOW_PREFETCH` are handled as vendor-specific, both of them are treated uniformly across all 3 cpu families. Can those be moved into vendor-agnostic part now? > > Overall, I'm more comfortable with moving the check rather than duplicating it in AMD-specific block. I would agree on moving `CPU_CLWB` check to common block, if we only had Intel and AMD for x86 support. But there is also ZX, and I cannot find any docs for that implementation, so I presume pessimistically that we cannot trust the CPUID bit for `CLWB` is in the same place for that platform. So in my mind checking `CLWB` for Intel and AMD specifically is safer. As the compromise, we can move `CLWB` to common block, but predicate it with `!is_zx()`, since we don't know about it. I think `CPU_SERIALIZE` would be another flag like this. I agree that `CPU_LZCNT`, `CPU_3DNOW_PREFETCH` can now be moved to common block. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24385#discussion_r2034932771 From kbarrett at openjdk.org Wed Apr 9 09:42:42 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 9 Apr 2025 09:42:42 GMT Subject: RFR: 8351157: Clean up x86 GC barriers after 32-bit x86 removal [v3] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:31:44 GMT, Aleksey Shipilev wrote: >> Assembler GC barriers have quite a bit of coding to support 32-bit x86. As 32-bit x86 is removed, we can clean up those parts. >> >> We can eliminate `!LP64` blocks quite easily. We can also prune passing around `thread` argument, and just trust that `r15_thread` is always available. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8351157-x86-gc-barriers > - Merge branch 'master' into JDK-8351157-x86-gc-barriers > - Also do tlab_allocate > - Rely on R15 to be a thread register > - Work Still looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24253#pullrequestreview-2752823687 From kevinw at openjdk.org Wed Apr 9 09:45:43 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 9 Apr 2025 09:45:43 GMT Subject: RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v3] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 10:44:02 GMT, Kevin Walls wrote: >> We should be consistent, and run all OnError items in a new shell. Currently the ; separator causes a new shell, but multiple -XX:OnError= options are grouped into the same shell. >> >> next_OnError_command() decides on where a new command starts. It should recognise newlines, and all commands will get their own shell. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > comment Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24354#issuecomment-2789040171 From shade at openjdk.org Wed Apr 9 09:49:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 09:49:01 GMT Subject: RFR: 8353572: x86: AMD platforms miss the check for CLWB feature flag [v2] In-Reply-To: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> References: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> Message-ID: <1UeVxSPjBdbSxxkJDo8YdTXAfIweSrH__GF-ytGwpzw=.63f3ca29-119a-4814-90a2-08518c9e9ca2@github.com> > Noticed this when doing [JDK-8353558](https://bugs.openjdk.org/browse/JDK-8353558). We only check for CLWB feature flag for Intel platforms. But AMD APM (Rev. 3.36?March 2024) tells me there is a CLWB flag in CPUID Fn0000_0007_EBX_x0 leaf as well. It is in the same place as the flag for Intel. > > Additional testing: > - [x] Ad-hoc tests on Ryzen 5950X Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - More feature flag commonning - Merge branch 'master' into JDK-8353572-amd-clwb - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24385/files - new: https://git.openjdk.org/jdk/pull/24385/files/eae1c04f..c9ec23c3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24385&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24385&range=00-01 Stats: 22473 lines in 760 files changed: 15305 ins; 5220 del; 1948 mod Patch: https://git.openjdk.org/jdk/pull/24385.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24385/head:pull/24385 PR: https://git.openjdk.org/jdk/pull/24385 From shade at openjdk.org Wed Apr 9 09:49:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 09:49:01 GMT Subject: RFR: 8353572: x86: AMD platforms miss the check for CLWB feature flag [v2] In-Reply-To: References: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> <314QDfJV4auKGwpK0rkvupAG_iBr1icgZ6azVvWIJro=.d2f945d9-5a70-473e-b473-eb18e1153f05@github.com> Message-ID: On Wed, 9 Apr 2025 09:31:44 GMT, Aleksey Shipilev wrote: >> Well, not really. If it were like that, then all CPU sensing logic on x86 would have been vendor-specific. But it's not the case: among many features x86 CPUs may declare, just a few are treated as vendor-specific. >> >> I took a look at how it was handled before and many extensions Intel introduced were not guarded by `is_intel()` check in the first place. >> >> And there's even more to that: though `CPU_LZCNT` and `CPU_3DNOW_PREFETCH` are handled as vendor-specific, both of them are treated uniformly across all 3 cpu families. Can those be moved into vendor-agnostic part now? >> >> Overall, I'm more comfortable with moving the check rather than duplicating it in AMD-specific block. > > I would agree on moving `CPU_CLWB` check to common block, if we only had Intel and AMD for x86 support. But there is also ZX, and I cannot find any docs for that implementation, so I presume pessimistically that we cannot trust the CPUID bit for `CLWB` is in the same place for that platform. > > So in my mind checking `CLWB` for Intel and AMD specifically is safer. As the compromise, we can move `CLWB` to common block, but still distrust it when `is_zx()`, since we don't know about it. > > I agree that `CPU_LZCNT`, `CPU_3DNOW_PREFETCH` can now be moved to common block. See new commit, does that look better? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24385#discussion_r2034971990 From kevinw at openjdk.org Wed Apr 9 09:49:35 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 9 Apr 2025 09:49:35 GMT Subject: Integrated: 8353439: Shell grouping of -XX:OnError= commands is surprising In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 10:59:16 GMT, Kevin Walls wrote: > We should be consistent, and run all OnError items in a new shell. Currently the ; separator causes a new shell, but multiple -XX:OnError= options are grouped into the same shell. > > next_OnError_command() decides on where a new command starts. It should recognise newlines, and all commands will get their own shell. This pull request has now been integrated. Changeset: cd9fa3f7 Author: Kevin Walls URL: https://git.openjdk.org/jdk/commit/cd9fa3f7aa0324c575943deebb41f4f7ff4f73d3 Stats: 34 lines in 2 files changed: 29 ins; 0 del; 5 mod 8353439: Shell grouping of -XX:OnError= commands is surprising Reviewed-by: dholmes, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/24354 From sspitsyn at openjdk.org Wed Apr 9 09:56:29 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 9 Apr 2025 09:56:29 GMT Subject: RFR: 8352773: JVMTI should disable events during java upcalls In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:39:14 GMT, Alan Bateman wrote: > If this goes ahead then it allows for a discussion about changing JVMTI InterruptThread to invoke Thread.interrupt when the target is a platform thread. As you know, there is a long standing issue here where threads blocked on interruptible channels not being awakened by JVMTI InterruptThread. Yes, this fix is already including an update for interrupts. I'll add a comment with the fix details tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24539#issuecomment-2789083462 From ayang at openjdk.org Wed Apr 9 10:36:44 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 9 Apr 2025 10:36:44 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 08:10:34 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: > > - * missing file from merge > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq > - * make young gen length revising independent of refinement thread > * use a service task > * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update > - * fix IR code generation tests that change due to barrier cost changes > - * factor out card table and refinement table merging into a single > method > - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 > - * obsolete G1UpdateBufferSize > > G1UpdateBufferSize has previously been used to size the refinement > buffers and impose a minimum limit on the number of cards per thread > that need to be pending before refinement starts. > > The former function is now obsolete with the removal of the dirty > card queues, the latter functionality has been taken over by the new > diagnostic option `G1PerThreadPendingCardThreshold`. > > I prefer to make this a diagnostic option is better than a product option > because it is something that is only necessary for some test cases to > produce some otherwise unwanted behavior (continuous refinement). > > CSR is pending. > - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 170: > 168: } > 169: return result; > 170: } I see in `G1ConcurrentRefineThread::do_refinement`: // The yielding may have completed the task, check. if (!state.is_in_progress()) { I wonder if it's simpler to use `is_in_progress` consistently to detect whether we should restart sweep, instead of `_sweep_start_epoch`. src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 349: > 347: } > 348: > 349: bool has_sweep_rt_work = is_in_progress() && _state == State::SweepRT; Why `is_in_progress()`? src/hotspot/share/gc/g1/g1ConcurrentRefineStats.hpp line 79: > 77: > 78: void inc_cards_scanned(size_t increment = 1) { _cards_scanned += increment; } > 79: void inc_cards_clean(size_t increment = 1) { _cards_clean += increment; } The sole caller always passes in arg, so no need for default-arg-value. src/hotspot/share/gc/g1/g1ConcurrentRefineStats.hpp line 87: > 85: void add_atomic(G1ConcurrentRefineStats* other); > 86: > 87: G1ConcurrentRefineStats& operator+=(const G1ConcurrentRefineStats& other); Seems that these operators are not used after this PR. src/hotspot/share/gc/g1/g1ConcurrentRefineSweepTask.cpp line 83: > 81: break; > 82: } > 83: case G1RemSet::HasRefToOld : break; // Nothing special to do. Why doesn't call `inc_cards_clean_again` in this case? The card is cleared also. (In fact, I don't get why this needs to a separate case from `NoInteresting`.) src/hotspot/share/gc/g1/g1ConcurrentRefineSweepTask.cpp line 156: > 154: > 155: _refine_stats.inc_cards_scanned(claim.size()); > 156: _refine_stats.inc_cards_clean(claim.size() - scanned); I feel these two "scanned" mean sth diff; the local var should probably be sth like `num_dirty_cards`. src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 207: > 205: > 206: if (!interrupted_by_gc) { > 207: state.add_yield_duration(G1CollectedHeap::heap()->safepoint_duration() - synchronize_duration_at_sweep_start); I think this is recorded to later calculate actual refine-time, i.e. sweep-time - yield-time. However, why can't yield-duration be recorded in this refine-control-thread directly -- accumulation of `jlong yield_duration = os::elapsed_counter() - yield_start`. I feel that is easier to reason than going through g1heap. src/hotspot/share/gc/g1/g1ReviseYoungListTargetLengthTask.cpp line 75: > 73: { > 74: MutexLocker x(G1ReviseYoungLength_lock, Mutex::_no_safepoint_check_flag); > 75: G1Policy* p = g1h->policy(); Can probably use the existing `policy`. src/hotspot/share/gc/g1/g1ReviseYoungListTargetLengthTask.cpp line 88: > 86: } > 87: > 88: G1ReviseYoungLengthTargetLengthTask::G1ReviseYoungLengthTargetLengthTask(const char* name) : I wonder if the class name can be shortened a bit, sth like `G1ReviseYoungLengthTask`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033251162 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033222407 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033929489 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033975054 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033934399 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2033910496 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2032008908 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2029855278 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2029855435 From rcastanedalo at openjdk.org Wed Apr 9 12:03:49 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 9 Apr 2025 12:03:49 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> On Fri, 4 Apr 2025 08:10:34 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: > > - * missing file from merge > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq > - * make young gen length revising independent of refinement thread > * use a service task > * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update > - * fix IR code generation tests that change due to barrier cost changes > - * factor out card table and refinement table merging into a single > method > - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 > - * obsolete G1UpdateBufferSize > > G1UpdateBufferSize has previously been used to size the refinement > buffers and impose a minimum limit on the number of cards per thread > that need to be pending before refinement starts. > > The former function is now obsolete with the removal of the dirty > card queues, the latter functionality has been taken over by the new > diagnostic option `G1PerThreadPendingCardThreshold`. > > I prefer to make this a diagnostic option is better than a product option > because it is something that is only necessary for some test cases to > produce some otherwise unwanted behavior (continuous refinement). > > CSR is pending. > - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f Hi Thomas, great simplification and encouraging results! I reviewed the compiler-related parts of the changeset, including x64 and aarch64 changes. src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp line 246: > 244: __ cbz(new_val, done); > 245: } > 246: // Storing region crossing non-null, is card young? Suggestion: // Storing region crossing non-null. src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 101: > 99: } > 100: > 101: void G1BarrierSetAssembler::gen_write_ref_array_post_barrier(MacroAssembler* masm, DecoratorSet decorators, Have you measured the performance impact of inlining this assembly code instead of resorting to a runtime call as done before? Is it worth the maintenance cost (for every platform), risk of introducing bugs, etc.? src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 145: > 143: > 144: __ bind(is_clean_card); > 145: // Card was clean. Dirty card and go to next.. This code seems unreachable if `!UseCondCardMark`, meaning we only dirty cards here if `UseCondCardMark` is enabled. Is that intentional? src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 319: > 317: const Register thread, > 318: const Register tmp1, > 319: const Register tmp2, Since `tmp2` is not needed in the x64 post-barrier, I suggest not passing it around for this platform, for simplicity and also to make optimization opportunities more visible in the future. Here is my suggestion: https://github.com/robcasloz/jdk/commit/855ec8df4a641f8c491c5c09acea3ee434b7e230, feel free to merge if you agree. src/hotspot/share/gc/g1/c1/g1BarrierSetC1.cpp line 38: > 36: #include "c1/c1_LIRAssembler.hpp" > 37: #include "c1/c1_MacroAssembler.hpp" > 38: #endif // COMPILER1 I suggest removing the conditional compilation directives and grouping these includes together with the above `c1` ones. src/hotspot/share/gc/g1/c1/g1BarrierSetC1.cpp line 147: > 145: state->do_input(_thread); > 146: > 147: // Use temp registers to ensure these they use different registers. Suggestion: // Use temps to enforce different registers. src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 307: > 305: + 6 // same region check: Uncompress (new_val) oop, xor, shr, (cmp), jmp > 306: + 4 // new_val is null check > 307: + 4; // card not clean check. It probably does not affect the unrolling heuristics too much, but you may want to make the last cost component conditional on `UseCondCardMark`. src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 396: > 394: bool needs_liveness_data(const MachNode* mach) const { > 395: return G1BarrierStubC2::needs_pre_barrier(mach) || > 396: G1BarrierStubC2::needs_post_barrier(mach); Suggestion: // Liveness data is only required to compute registers that must be // preserved across the runtime call in the pre-barrier stub. return G1BarrierStubC2::needs_pre_barrier(mach); src/hotspot/share/gc/g1/g1BarrierSet.hpp line 56: > 54: // > 55: // The refinement threads mark cards in the current collection set specially on the > 56: // card table - this is fine wrt to synchronization with the mutator, because at Suggestion: // card table - this is fine wrt synchronization with the mutator, because at test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java line 521: > 519: phase = CompilePhase.FINAL_CODE) > 520: @IR(counts = {IRNode.COUNTED_LOOP, "2"}, > 521: phase = CompilePhase.FINAL_CODE) I suggest to remove this extra IR check to avoid over-specifying the expected loop shape. For example, running this test with loop unrolling disabled (`-XX:LoopUnrollLimit=0`) would now fail because only one counted loop would be found. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23739#pullrequestreview-2753154117 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035174209 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035175921 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035177738 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035183250 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035186980 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035192666 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035210464 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035196251 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035198219 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035201056 From tschatzl at openjdk.org Wed Apr 9 12:41:40 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Apr 2025 12:41:40 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> References: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> Message-ID: On Wed, 9 Apr 2025 11:35:26 GMT, Roberto Casta?eda Lozano wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: >> >> - * missing file from merge >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq >> - * make young gen length revising independent of refinement thread >> * use a service task >> * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update >> - * fix IR code generation tests that change due to barrier cost changes >> - * factor out card table and refinement table merging into a single >> method >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 >> - * obsolete G1UpdateBufferSize >> >> G1UpdateBufferSize has previously been used to size the refinement >> buffers and impose a minimum limit on the number of cards per thread >> that need to be pending before refinement starts. >> >> The former function is now obsolete with the removal of the dirty >> card queues, the latter functionality has been taken over by the new >> diagnostic option `G1PerThreadPendingCardThreshold`. >> >> I prefer to make this a diagnostic option is better than a product option >> because it is something that is only necessary for some test cases to >> produce some otherwise unwanted behavior (continuous refinement). >> >> CSR is pending. >> - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 145: > >> 143: >> 144: __ bind(is_clean_card); >> 145: // Card was clean. Dirty card and go to next.. > > This code seems unreachable if `!UseCondCardMark`, meaning we only dirty cards here if `UseCondCardMark` is enabled. Is that intentional? Great find! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035280909 From tschatzl at openjdk.org Wed Apr 9 12:50:42 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Apr 2025 12:50:42 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> References: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> Message-ID: On Wed, 9 Apr 2025 11:34:09 GMT, Roberto Casta?eda Lozano wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: >> >> - * missing file from merge >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq >> - * make young gen length revising independent of refinement thread >> * use a service task >> * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update >> - * fix IR code generation tests that change due to barrier cost changes >> - * factor out card table and refinement table merging into a single >> method >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 >> - * obsolete G1UpdateBufferSize >> >> G1UpdateBufferSize has previously been used to size the refinement >> buffers and impose a minimum limit on the number of cards per thread >> that need to be pending before refinement starts. >> >> The former function is now obsolete with the removal of the dirty >> card queues, the latter functionality has been taken over by the new >> diagnostic option `G1PerThreadPendingCardThreshold`. >> >> I prefer to make this a diagnostic option is better than a product option >> because it is something that is only necessary for some test cases to >> produce some otherwise unwanted behavior (continuous refinement). >> >> CSR is pending. >> - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 101: > >> 99: } >> 100: >> 101: void G1BarrierSetAssembler::gen_write_ref_array_post_barrier(MacroAssembler* masm, DecoratorSet decorators, > > Have you measured the performance impact of inlining this assembly code instead of resorting to a runtime call as done before? Is it worth the maintenance cost (for every platform), risk of introducing bugs, etc.? I remember significant impact in some microbenchmark. It's also inlined in Parallel GC. I do not consider it a big issue wrt to maintenance - these things never really change, and the method is small and contained. I will try to redo numbers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035298557 From duke at openjdk.org Wed Apr 9 13:24:33 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Wed, 9 Apr 2025 13:24:33 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock [v4] In-Reply-To: References: Message-ID: <5MzHrfRc-pu8J5vQdmeeuk4HxP7uumhhVfELVs_VRwU=.4d86b907-a8e8-4bbd-98cb-892491b5c2ad@github.com> > ### Update: > After some discussion it was decided it's not necessary to expand the lock scope for reserve/commit. Instead, we are opting to add comments explaining the reasons for locking and the conditions to avoid which could lead to races. Some of the new tests can be kept because they are general enough to be useful outside of this context. > > ### Summary: > This PR makes memory operations atomic with NMT accounting. > > ### The problem: > In memory related functions like `os::commit_memory` and `os::reserve_memory` the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in `MemTracker`, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races. > > 1.1 Thread_1 releases range_A. > 1.2 Thread_1 tells NMT "range_A has been released". > > 2.1 Thread_2 reserves (the now free) range_A. > 2.2 Thread_2 tells NMT "range_A is reserved". > > Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS. > > ### Solution: > Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself. > > ### Other notes: > I also simplified this pattern found in many places: > > if (MemTracker::enabled()) { > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_some_operation(addr, bytes); > if (result != nullptr) { > MemTracker::record_some_operation(addr, bytes); > } > } else { > result = pd_unmap_memory(addr, bytes); > } > ``` > To: > > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_unmap_memory(addr, bytes); > MemTracker::record_some_operation(addr, bytes); > ``` > This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. > > I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section... Robert Toyonaga has updated the pull request incrementally with four additional commits since the last revision: - Update test/hotspot/gtest/runtime/test_os.cpp Co-authored-by: Stefan Karlsson - Update test/hotspot/gtest/runtime/test_os.cpp Co-authored-by: Stefan Karlsson - Update test/hotspot/gtest/runtime/test_os.cpp Co-authored-by: Stefan Karlsson - Update test/hotspot/gtest/runtime/test_os.cpp Co-authored-by: Stefan Karlsson ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24084/files - new: https://git.openjdk.org/jdk/pull/24084/files/5c23a76a..813a1e49 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24084&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24084&range=02-03 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24084.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24084/head:pull/24084 PR: https://git.openjdk.org/jdk/pull/24084 From duke at openjdk.org Wed Apr 9 13:24:35 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Wed, 9 Apr 2025 13:24:35 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock [v3] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 14:08:23 GMT, Stefan Karlsson wrote: >> Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision: >> >> exclude file mapping tests on AIX. > > src/hotspot/share/runtime/os.cpp line 2206: > >> 2204: // when it is actually committed. The opposite scenario is not guarded against. pd_commit_memory and >> 2205: // record_virtual_memory_commit do not happen atomically. We assume that there is some external synchronization >> 2206: // that prevents a region from being uncommitted before it is finished being committed. > > It's not a requirement, but you get kudos from me if you keep comments lines below 80 lines. I typically don't like code to be 80 lines, but comments tend to be nicer if they are. Ok I'll try to shorten these comments a bit ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24084#discussion_r2035360006 From duke at openjdk.org Wed Apr 9 13:43:01 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Wed, 9 Apr 2025 13:43:01 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock [v5] In-Reply-To: References: Message-ID: > ### Update: > After some discussion it was decided it's not necessary to expand the lock scope for reserve/commit. Instead, we are opting to add comments explaining the reasons for locking and the conditions to avoid which could lead to races. Some of the new tests can be kept because they are general enough to be useful outside of this context. > > ### Summary: > This PR makes memory operations atomic with NMT accounting. > > ### The problem: > In memory related functions like `os::commit_memory` and `os::reserve_memory` the OS memory operations are currently done before acquiring the the NMT mutex. And the the virtual memory accounting is done later in `MemTracker`, after the lock has been acquired. Doing the memory operations outside of the lock scope can lead to races. > > 1.1 Thread_1 releases range_A. > 1.2 Thread_1 tells NMT "range_A has been released". > > 2.1 Thread_2 reserves (the now free) range_A. > 2.2 Thread_2 tells NMT "range_A is reserved". > > Since the sequence (1.1) (1.2) is not atomic, if Thread_2 begins operating after (1.1), we can have (1.1) (2.1) (2.2) (1.2). The OS sees two valid subsequent calls (release range_A, followed by map range_A). But NMT sees "reserve range_A", "release range_A" and is now out of sync with the OS. > > ### Solution: > Where memory operations such as reserve, commit, or release virtual memory happen, I've expanded the scope of `NmtVirtualMemoryLocker` to protect both the NMT accounting and the memory operation itself. > > ### Other notes: > I also simplified this pattern found in many places: > > if (MemTracker::enabled()) { > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_some_operation(addr, bytes); > if (result != nullptr) { > MemTracker::record_some_operation(addr, bytes); > } > } else { > result = pd_unmap_memory(addr, bytes); > } > ``` > To: > > MemTracker::NmtVirtualMemoryLocker nvml; > result = pd_unmap_memory(addr, bytes); > MemTracker::record_some_operation(addr, bytes); > ``` > This is possible because `NmtVirtualMemoryLocker` now checks `MemTracker::enabled()`. `MemTracker::record_some_operation` already checks `MemTracker::enabled()` and checks against nullptr. This refactoring previously wasn't possible because `ThreadCritical` was used before https://github.com/openjdk/jdk/pull/22745 introduced `NmtVirtualMemoryLocker`. > > I considered moving the locking and NMT accounting down into platform specific code: Ex. lock around { munmap() + MemTracker::record }. The hope was that this would help reduce the size of the critical section... Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision: improve tests and comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24084/files - new: https://git.openjdk.org/jdk/pull/24084/files/813a1e49..7b7263b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24084&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24084&range=03-04 Stats: 23 lines in 2 files changed: 1 ins; 8 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/24084.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24084/head:pull/24084 PR: https://git.openjdk.org/jdk/pull/24084 From jsikstro at openjdk.org Wed Apr 9 13:56:53 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 9 Apr 2025 13:56:53 GMT Subject: RFR: 8350441: ZGC: Overhaul Page Allocation Message-ID: > Note that any reference to pages from here on out refers to the concept of a heap region in ZGC, not pages in the operating system (OS), unless stated otherwise. # Background This PR addresses fragmentation by introducing a Mapped Cache that replaces the Page Cache in ZGC. The largest limitation of the Page Cache is that it is constrained by the abstraction of what a page is. The proposed Mapped Cache removes this limitation by decoupling memory from pages, allowing it to merge and split memory in ways that the Page Cache is not suited for. To facilitate the transition, much of the Page Allocator has been redesigned to work with the Mapped Cache. In addition to fighting fragmentation, the new approach improves NUMA-support and simplifies memory unampping. Combined, these changes lay the foundation for even more improvements in ZGC, like replacing multi-mapped memory with anonymous memory. # Why a Mapped Cache? The main benefit of the Mapped Cache is that adjacent virtual memory ranges in the cache can be merged to create larger ranges, enabling larger allocation requests to succeed more easily. Most notably, it allows allocations to succeed more often without "harvesting" smaller, discontiguous ranges. Harvesting negatively impacts both fragmentation and latency, as it requires remapping memory into a new contiguous virtual address range. Fragmentation becomes especially problematic in long-running programs and in environments with limited address space, where finding large contiguous regions can be difficult and may lead to premature Out Of Memory Errors (OOME). The Mapped Cache uses a self-balancing binary search tree to store memory ranges. Since the ranges are unused when inside the cache, the tree can use this memory to store metadata about itself, referred to as intrusive storage. This approach eliminates the need for dynamic memory allocation (e.g., malloc), which could otherwise introduce a latency overhead. # Fragmentation Currently, ZGC has multiple strategies for dealing with fragmentation. In some edge cases, these strategies are not as efficient as we would like. By addressing fragmentation differently with the Mapped Cache, ZGC is in a better position to avoid edge cases, which are bad even if they occur only once. This is especially impactful for programs running with a large heap. ## Virtual Memory Shuffling In addition to the Mapped Cache, we have made some adjustments in how ZGC deals with virtual memory. When harvesting memory, which needs to be remapped, new contiguous virtual memory must first be claimed. We have now added a feature in which the harvested memory can be re-used to improve the likelihood of finding a contiguous range. Additionally, we have re-designed the defragmentation policy so that Large pages are always defragmented upon being freed. When freed, they are broken down and remapped into lower address space, in the hopes of "filling holes" and creating more contiguous ranges. # NUMA and Partitions In the current policy, ZGC interleaves memory across all NUMA nodes with a granularity of ZGranuleSize (2MB), which is the same size as a Small page. As a result, Small pages will end up on a single, preferably local, NUMA node, whilst larger allocations will (likely) end up on multiple NUMA nodes. In the new design, the policy is to prefer allocating *all* allocation sizes to the local NUMA node whenever possible. As an effect, ZGC may be able to extract better performance from NUMA systems. To support local NUMA allocations, the Page Allocator, and in turn the Java heap, has been split up into what we refer to as Partitions. A partition keeps track of its own heap size and Mapped Cache, allowing it to only handle memory that is associated with its own share of the heap. The number of partitions is currently the same as the number of NUMA nodes. On non-NUMA systems, only a single partition is kept track of. The introduction of partitions also establishes a foundation for more fine-grained control over the heap, paving the way for future enhancements, both NUMA possibilities and new features, such as Thread-Local GC. # Defragmentation (Unmapping Memory) Up until now, ZGC has unmapped memory asynchronously in a separate thread. The benefit of this is that other threads do not need to take a latency hit when unmapping memory. The main dependency on asynchronous unmapping is when harvesting, especially from a mutator thread, where synchronous unmapping could lead to unwanted latency. With the introduction of the Mapped Cache, and by moving defragmentation away from mutator threads to the GC, asynchronous unmapping is no longer necessary to meet our latency goals. Instead, memory is now unmapped synchronously. The number of times memory is defragmented for page allocations has been reduced significantly. However, memory for Small pages never needs to be defragmented at all. For Large pages, memory defragmentation has little effect on the total latency, as they are costly to allocate anyways. For Medium pages, we have plans for future enhancements where memory is defragmented even less, or not at all. For clarity: with the removal of asynchronous unmapping, we have removed the ZUnmapper thread and ZUnmap JFR event. # Multi-Mapped Memory Asynchronous unmapping has so far been possible because ZGC is backed by shared memory (on Linux), which allows memory to be multi-mapped. This is an artifact from non-generational ZGC, which used multi-mapping in its core design (See [this](https://wiki.openjdk.org/display/zgc/Pointer+Metadata+using+Multi-Mapped+memory) resource for more info). A goal we have in ZGC is to move from shared memory to anonymous memory. There are multiple benefits with anonymous memory, one of them being easier configuration for Transparent Huge Pages (OS pages). Anonymous memory doesn't support multi-mapped memory, and would be blocked by the asynchronous unmapping feature. However, with the removal of asynchronous unmapping, we are now better prepared for transitioning to anonymous memory. # Additional Notes This RFE comes with our own implementation of a red-black tree for the Mapped Cache. Another red-black tree was recently introduced by C. Norrbin in [JDK-8345314](https://bugs.openjdk.org/browse/JDK-8345314) (and enhanced in [JDK-8349211](https://bugs.openjdk.org/browse/JDK-8349211)). Our goal is to initially integrate with our implementation, but remove our implementation in favor of Norrbin's tree in a future RFE. The reason we have our own tree implementation is because Norrbin's tree was not finished during the time we were developing and testing this RFE. Some new additions have been made to keep the current functionality in the Serviceability Agent (SA). # Testing * Oracle's tiers 1-8 * We have added a small set of new tests, both gtests and jtreg tests, to test new functionality # Performance * Improvements in tail latency in SPECjbb2015. * Improvements when using small OS pages in combination with NUMA. * Small increase in the time it takes to run a GC. This is because some work has been moved from mutator threads to only be done in GC threads. This should not affect the total run-time of a program as the total work remains the same, but mutator latency is improved. * Other suitable benchmarks show no significant improvements or regressions. ------------- Commit messages: - Whitespace fix in zunittest.hpp - Copyright years - 8350441: ZGC: Overhaul Page Allocation Changes: https://git.openjdk.org/jdk/pull/24547/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24547&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350441 Stats: 12052 lines in 118 files changed: 7936 ins; 3218 del; 898 mod Patch: https://git.openjdk.org/jdk/pull/24547.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24547/head:pull/24547 PR: https://git.openjdk.org/jdk/pull/24547 From rvansa at openjdk.org Wed Apr 9 14:35:36 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Wed, 9 Apr 2025 14:35:36 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:05:18 GMT, Johan Sj?len wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix compilation error in assertion > > src/hotspot/share/oops/instanceKlass.cpp line 1944: > >> 1942: if (!fs.access_flags().is_static()) { >> 1943: fd = fs.field_descriptor(); >> 1944: Tuple f(fs.offset(), fs.index(), fs.to_FieldInfo()); > > `FieldInfo` contains the `offset`, so that's not necessary. Besides, the `index` is now only used for an `assert` in the `reinitialize` code. Why not get rid of the index argument and the assert? > > In other words: Get rid of the `Tuple` changes and have `fields_sorted` be a `GrowableArray` of `FiedlInfo` only. The suggestions make sense, I'll apply them. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24290#discussion_r2035511849 From tschatzl at openjdk.org Wed Apr 9 14:38:46 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 9 Apr 2025 14:38:46 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 19:59:09 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: >> >> - * missing file from merge >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq >> - * make young gen length revising independent of refinement thread >> * use a service task >> * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update >> - * fix IR code generation tests that change due to barrier cost changes >> - * factor out card table and refinement table merging into a single >> method >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 >> - * obsolete G1UpdateBufferSize >> >> G1UpdateBufferSize has previously been used to size the refinement >> buffers and impose a minimum limit on the number of cards per thread >> that need to be pending before refinement starts. >> >> The former function is now obsolete with the removal of the dirty >> card queues, the latter functionality has been taken over by the new >> diagnostic option `G1PerThreadPendingCardThreshold`. >> >> I prefer to make this a diagnostic option is better than a product option >> because it is something that is only necessary for some test cases to >> produce some otherwise unwanted behavior (continuous refinement). >> >> CSR is pending. >> - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f > > src/hotspot/share/gc/g1/g1ConcurrentRefineSweepTask.cpp line 83: > >> 81: break; >> 82: } >> 83: case G1RemSet::HasRefToOld : break; // Nothing special to do. > > Why doesn't call `inc_cards_clean_again` in this case? The card is cleared also. (In fact, I don't get why this needs to a separate case from `NoInteresting`.) "NoInteresting" means that the card contains no interesting reference at all. "HasRefToOld" means that there has been an interesting reference in the card. The distinction between these groups of cards seems interesting to me. E.g. out of X non-clean cards, there were A with a reference to the collection set, B that were already marked as containing a card to the collection, C not having any interesting card any more (transitioned from clean -> dirty -> clean, and cleared by the mutator), D being non-parsable, and E having references to old (and no other references). I could add a separate counter for these type of cards too - they can be inferred from the total number of scanned minus the others though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2035512686 From shade at openjdk.org Wed Apr 9 15:13:08 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 15:13:08 GMT Subject: RFR: 8351152: x86: Remove code blocks that handle UseSSE < 2 Message-ID: 32-bit x86 was the platform that supported `UseSSE < 2`. 64-bit x86 baselines on `UseSSE >= 2`. After 32-bit x86 code is gone, we can remove all code blocks that are there to support `UseSSE < 2`. Additional testing: - [x] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Also 24-bit removals - Touchups - Fix Changes: https://git.openjdk.org/jdk/pull/24484/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24484&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351152 Stats: 643 lines in 16 files changed: 34 ins; 363 del; 246 mod Patch: https://git.openjdk.org/jdk/pull/24484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24484/head:pull/24484 PR: https://git.openjdk.org/jdk/pull/24484 From ihse at openjdk.org Wed Apr 9 15:23:17 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 9 Apr 2025 15:23:17 GMT Subject: RFR: 8354213: Restore pointless unicode characters to ASCII Message-ID: I believe the source code of the JDK should be in US-ASCII if possible, and only employ extended characters if that is strictly necessary for the code to work. In my attempt to figure out which non-ascii files are another encoding than utf-8 (see [JDK-8301971](https://bugs.openjdk.org/browse/JDK-8301971)), I discovered a handful of files that use unicode characters for no good reason, when normal ASCII characters could have been used (and have been used everywhere else in the code base in similar contexts). ------------- Commit messages: - 8354213: Restore pointless unicode characters to ASCII Changes: https://git.openjdk.org/jdk/pull/24552/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24552&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354213 Stats: 25 lines in 15 files changed: 0 ins; 0 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/24552.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24552/head:pull/24552 PR: https://git.openjdk.org/jdk/pull/24552 From iklam at openjdk.org Wed Apr 9 15:24:53 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 9 Apr 2025 15:24:53 GMT Subject: RFR: 8351319: AOT cache support for custom class loaders broken since JDK-8348426 [v6] In-Reply-To: References: Message-ID: > Since [JDK-8348426](https://bugs.openjdk.org/browse/JDK-8348426) (Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file), the AOT cache no longer contains classes intended for custom class loaders (these are called "unregistered classes" in CDS terminology). > > The fix is simple -- we already remember the set of unregistered classes in the AOT configuration file. We just need to add them into the final AOT cache (see changes in finalImageRecipes.cpp). Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' into 8351319-support-for-custom-loaders-missing-since-jdk-8348426 - Merge branch 'master' into 8351319-support-for-custom-loaders-missing-since-jdk-8348426 - Fixed (1) size/crc was not set so the SimpleCusty class was not loaded from cache; (2) cp->resolved_reference_length() was not set correctly - Avoid duplicated unregistered classes that have the same name - Merge branch 'master' into 8351319-support-for-custom-loaders-missing-since-jdk-8348426 - Merge branch 'master' into 8351319-support-for-custom-loaders-missing-since-jdk-8348426 - 8351319: AOT cache support for custom class loaders broken since JDK-8348426 ------------- Changes: https://git.openjdk.org/jdk/pull/23926/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23926&range=05 Stats: 184 lines in 13 files changed: 151 ins; 0 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/23926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23926/head:pull/23926 PR: https://git.openjdk.org/jdk/pull/23926 From ihse at openjdk.org Wed Apr 9 15:39:16 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 9 Apr 2025 15:39:16 GMT Subject: RFR: 8354213: Restore pointless unicode characters to ASCII [v2] In-Reply-To: References: Message-ID: > I believe the source code of the JDK should be in US-ASCII if possible, and only employ extended characters if that is strictly necessary for the code to work. > > In my attempt to figure out which non-ascii files are another encoding than utf-8 (see [JDK-8301971](https://bugs.openjdk.org/browse/JDK-8301971)), I discovered a handful of files that use unicode characters for no good reason, when normal ASCII characters could have been used (and have been used everywhere else in the code base in similar contexts). Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Also fix pointless unicode characters for tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24552/files - new: https://git.openjdk.org/jdk/pull/24552/files/4197daa9..284b278d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24552&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24552&range=00-01 Stats: 20 lines in 7 files changed: 0 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/24552.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24552/head:pull/24552 PR: https://git.openjdk.org/jdk/pull/24552 From iklam at openjdk.org Wed Apr 9 15:46:16 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 9 Apr 2025 15:46:16 GMT Subject: RFR: 8351319: AOT cache support for custom class loaders broken since JDK-8348426 [v7] In-Reply-To: References: Message-ID: > Since [JDK-8348426](https://bugs.openjdk.org/browse/JDK-8348426) (Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file), the AOT cache no longer contains classes intended for custom class loaders (these are called "unregistered classes" in CDS terminology). > > The fix is simple -- we already remember the set of unregistered classes in the AOT configuration file. We just need to add them into the final AOT cache (see changes in finalImageRecipes.cpp). Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Fixed merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23926/files - new: https://git.openjdk.org/jdk/pull/23926/files/17dcf9c0..a2723bfb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23926&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23926&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23926/head:pull/23926 PR: https://git.openjdk.org/jdk/pull/23926 From ihse at openjdk.org Wed Apr 9 15:53:51 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 9 Apr 2025 15:53:51 GMT Subject: RFR: 8354213: Restore pointless unicode characters to ASCII [v3] In-Reply-To: References: Message-ID: > I believe the source code of the JDK should be in US-ASCII if possible, and only employ extended characters if that is strictly necessary for the code to work. > > In my attempt to figure out which non-ascii files are another encoding than utf-8 (see [JDK-8301971](https://bugs.openjdk.org/browse/JDK-8301971)), I discovered a handful of files that use unicode characters for no good reason, when normal ASCII characters could have been used (and have been used everywhere else in the code base in similar contexts). Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Oops. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24552/files - new: https://git.openjdk.org/jdk/pull/24552/files/284b278d..56808d8c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24552&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24552&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24552.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24552/head:pull/24552 PR: https://git.openjdk.org/jdk/pull/24552 From shade at openjdk.org Wed Apr 9 16:06:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 16:06:45 GMT Subject: RFR: 8351152: x86: Remove code blocks that handle UseSSE < 2 [v2] In-Reply-To: References: Message-ID: > 32-bit x86 was the platform that supported `UseSSE < 2`. 64-bit x86 baselines on `UseSSE >= 2`: https://github.com/openjdk/jdk/blob/567c6885a377e5641deef9cd3498f79c5346cd6a/src/hotspot/cpu/x86/vm_version_x86.cpp#L895-L902 > > After 32-bit x86 code is gone, we can remove all code blocks that are there to support `UseSSE < 2`. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Also purge vestigial calls to VMVersion::supports_sse{2} ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24484/files - new: https://git.openjdk.org/jdk/pull/24484/files/04f50944..7f505137 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24484&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24484&range=00-01 Stats: 179 lines in 1 file changed: 0 ins; 179 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24484/head:pull/24484 PR: https://git.openjdk.org/jdk/pull/24484 From kvn at openjdk.org Wed Apr 9 16:06:45 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 9 Apr 2025 16:06:45 GMT Subject: RFR: 8351152: x86: Remove code blocks that handle UseSSE < 2 [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 16:03:26 GMT, Aleksey Shipilev wrote: >> 32-bit x86 was the platform that supported `UseSSE < 2`. 64-bit x86 baselines on `UseSSE >= 2`: https://github.com/openjdk/jdk/blob/567c6885a377e5641deef9cd3498f79c5346cd6a/src/hotspot/cpu/x86/vm_version_x86.cpp#L895-L902 >> >> After 32-bit x86 code is gone, we can remove all code blocks that are there to support `UseSSE < 2`. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Also purge vestigial calls to VMVersion::supports_sse{2} Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24484#pullrequestreview-2754003015 From duke at openjdk.org Wed Apr 9 16:23:45 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Wed, 9 Apr 2025 16:23:45 GMT Subject: RFR: 8341491: Reserve and commit memory operations should be protected by NMT lock In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 15:29:55 GMT, Stefan Karlsson wrote: >> OK should I update this PR to do the following things: >> - Add comments explaining the asymmetrical locking and warning against patterns that lead to races >> - swapping the order of `NmtVirtualMemoryLocker` and release/uncommit >> - Fail fatally if release/uncommit does not complete. >> >> Or does it make more sense to do that in a different issue/PR? >> >> Also, do we want to keep the new tests and the refactorings (see below)? >> >> if (MemTracker::enabled()) { >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_some_operation(addr, bytes); >> if (result != nullptr) { >> MemTracker::record_some_operation(addr, bytes); >> } >> } else { >> result = pd_unmap_memory(addr, bytes); >> } >> >> To: >> >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_unmap_memory(addr, bytes); >> MemTracker::record_some_operation(addr, bytes); > >> OK should I update this PR to do the following things: >> >> * Add comments explaining the asymmetrical locking and warning against patterns that lead to races > > Sounds like a good idea. > >> >> * swapping the order of `NmtVirtualMemoryLocker` and release/uncommit > > I wonder if this should be done as new RFE after the change below. It might need a bit of investigation to make sure that the reasoning around this is correct. > >> >> * Fail fatally if release/uncommit does not complete. > > I think this would be a good, separate RFE to be done before we try to swap the order. > >> >> >> Or does it make more sense to do that in a different issue/PR? >> >> Also, do we want to keep the new tests and the refactorings (see below)? >> >> ``` >> if (MemTracker::enabled()) { >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_some_operation(addr, bytes); >> if (result != nullptr) { >> MemTracker::record_some_operation(addr, bytes); >> } >> } else { >> result = pd_unmap_memory(addr, bytes); >> } >> ``` >> >> To: >> >> ``` >> MemTracker::NmtVirtualMemoryLocker nvml; >> result = pd_unmap_memory(addr, bytes); >> MemTracker::record_some_operation(addr, bytes); >> ``` > > My thinking is that after you done (2) above, then you will not need to expose the NMT lock to this level. The code would be: > > MemTracker::record_some_operation(addr, bytes); // Lock confined inside this > > pd_unmap_memory(addr, bytes); > > > So, I would wait with this cleanup until we know more about (2). Thank you @stefank for the feedback. I've applied your suggestions. @tstuefe, when you have time, can you please have another look at this? Based on the discussion above, I've reverted the changes to the locking scope in favor of new comments explaining the asymmetrical locking and warning against patterns that lead to races. The new tests that are still relevant are kept. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2790269355 From ccheung at openjdk.org Wed Apr 9 16:51:25 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 9 Apr 2025 16:51:25 GMT Subject: RFR: 8351319: AOT cache support for custom class loaders broken since JDK-8348426 [v7] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 15:46:16 GMT, Ioi Lam wrote: >> Since [JDK-8348426](https://bugs.openjdk.org/browse/JDK-8348426) (Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file), the AOT cache no longer contains classes intended for custom class loaders (these are called "unregistered classes" in CDS terminology). >> >> The fix is simple -- we already remember the set of unregistered classes in the AOT configuration file. We just need to add them into the final AOT cache (see changes in finalImageRecipes.cpp). > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Fixed merge Updates look good. ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23926#pullrequestreview-2754131919 From duke at openjdk.org Wed Apr 9 17:12:47 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 9 Apr 2025 17:12:47 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v14] In-Reply-To: <-W1vBCTLtPyOZNm6XhHQXT9spBbkAd4Z4rTn_LHH1Aw=.5beae719-ac8b-404a-a34c-deecfc97dd7e@github.com> References: <394Wf5RpbwUgE7zBaZBnwa2YAxQFwWDhF1VuaMPHdhE=.98ff29f7-b6a7-49eb-bdd6-8489568b24b7@github.com> <-W1vBCTLtPyOZNm6XhHQXT9spBbkAd4Z4rTn_LHH1Aw=.5beae719-ac8b-404a-a34c-deecfc97dd7e@github.com> Message-ID: On Tue, 8 Apr 2025 21:58:57 GMT, Sandhya Viswanathan wrote: > Overall very clean and nicely done PR. Thanks a lot for considering my inputs. That is in no small part thanks to the reviewers, especially to Volodymyr! @lmesnik, @jatin-bhateja, @sviswa7 would one of you /sponsor me with the integration? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23860#issuecomment-2790417248 From naoto at openjdk.org Wed Apr 9 17:24:43 2025 From: naoto at openjdk.org (Naoto Sato) Date: Wed, 9 Apr 2025 17:24:43 GMT Subject: RFR: 8354213: Restore pointless unicode characters to ASCII [v3] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 15:53:51 GMT, Magnus Ihse Bursie wrote: >> I believe the source code of the JDK should be in US-ASCII if possible, and only employ extended characters if that is strictly necessary for the code to work. >> >> In my attempt to figure out which non-ascii files are another encoding than utf-8 (see [JDK-8301971](https://bugs.openjdk.org/browse/JDK-8301971)), I discovered a handful of files that use unicode characters for no good reason, when normal ASCII characters could have been used (and have been used everywhere else in the code base in similar contexts). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Oops. Looks good to me. I was just expecting the usual suspects, such as, apostrophe/hyphen-minus variations in comments, but never expected zero-width space, or Cyrillic "C" in place for ascii "C" in the code! ------------- Marked as reviewed by naoto (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24552#pullrequestreview-2754211738 From jiangli at openjdk.org Wed Apr 9 17:33:32 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 9 Apr 2025 17:33:32 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 [v3] In-Reply-To: <9k911IJe4DJG2PKWmXuaAY5WJYBuwFxyPfzd422V5FU=.72584eaf-bdf7-4549-9d73-808ab6e96466@github.com> References: <9k911IJe4DJG2PKWmXuaAY5WJYBuwFxyPfzd422V5FU=.72584eaf-bdf7-4549-9d73-808ab6e96466@github.com> Message-ID: On Tue, 8 Apr 2025 03:55:03 GMT, SendaoYan wrote: >> Hi all, >> >> This PR will try to fix memory leak after JDK-8352184. which re-do [JDK-8352184](https://bugs.openjdk.org/browse/JDK-8352184) using the original, purely static uses of the various description strings, as @jianglizhou had proposed. >> >> Additional testing: >> >> - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64 >> - [x] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 >> - [x] full `java -version` tests, the test shell script show as below. >> >> [JDK-8353189.sh.txt](https://github.com/user-attachments/files/19643535/JDK-8353189.sh.txt) > > SendaoYan has updated the pull request incrementally with one additional commit since the last revision: > > add a comment to explain why we avoid dynamic memory allocation for the vm_info_string Marked as reviewed by jiangli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24299#pullrequestreview-2754230246 From jiangli at openjdk.org Wed Apr 9 17:33:32 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 9 Apr 2025 17:33:32 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 [v2] In-Reply-To: References: <8py7v6gQd9nfucRK28h2enHWk21PevhMa74FYx3bRsI=.663fa24c-4218-426b-bda4-2e8d6ecb02aa@github.com> Message-ID: On Wed, 9 Apr 2025 02:03:39 GMT, SendaoYan wrote: >> Looks ok to me, thanks. Please update the contributor properly since the change is from https://github.com/openjdk/jdk/pull/24171/commits/baff6b166d130c9adeecfb9f2b418d86322d4826. > > Okey, that was my original plan also. > > By the way, do you mind add David Holmes as contributor also, because he do lots of investigation on this issue? Sounds good to me! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24299#discussion_r2035822718 From vlivanov at openjdk.org Wed Apr 9 17:40:39 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 9 Apr 2025 17:40:39 GMT Subject: RFR: 8353572: x86: AMD platforms miss the check for CLWB feature flag [v2] In-Reply-To: <1UeVxSPjBdbSxxkJDo8YdTXAfIweSrH__GF-ytGwpzw=.63f3ca29-119a-4814-90a2-08518c9e9ca2@github.com> References: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> <1UeVxSPjBdbSxxkJDo8YdTXAfIweSrH__GF-ytGwpzw=.63f3ca29-119a-4814-90a2-08518c9e9ca2@github.com> Message-ID: On Wed, 9 Apr 2025 09:49:01 GMT, Aleksey Shipilev wrote: >> Noticed this when doing [JDK-8353558](https://bugs.openjdk.org/browse/JDK-8353558). We only check for CLWB feature flag for Intel platforms. But AMD APM (Rev. 3.36?March 2024) tells me there is a CLWB flag in CPUID Fn0000_0007_EBX_x0 leaf as well. It is in the same place as the flag for Intel. >> >> Additional testing: >> - [x] Ad-hoc tests on Ryzen 5950X > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - More feature flag commonning > - Merge branch 'master' into JDK-8353572-amd-clwb > - Fix src/hotspot/cpu/x86/vm_version_x86.cpp line 3118: > 3116: // We do not know if these are supported by ZX, > 3117: // so we cannot trust common CPUID bit for it. > 3118: result &= ~CPU_CLWB; I'd prefer to completely drop this adjustment, but if you do want to keep it, I'd add an assert (and/or a warning?) to fire when CLWB bit is present on ZX CPUs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24385#discussion_r2035832615 From coleenp at openjdk.org Wed Apr 9 17:45:32 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 9 Apr 2025 17:45:32 GMT Subject: RFR: 8349007: The jtreg test ResolvedMethodTableHash takes excessive time [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 20:45:50 GMT, Coleen Phillimore wrote: >> This is mostly test change. ResolvedMethodTableHash.java is run /manual but still takes a long time and doesn't really verify that the hash code is any good. This add logging, triggers concurrent work like other ConcurrentHashtables, and checks that a small table is not rehashed. >> Tested with tier1 (including test). > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix indent and hardcode 1001 loops. Thank you for the code reviews Leonid and Matias. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24383#issuecomment-2790490566 From coleenp at openjdk.org Wed Apr 9 17:45:33 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 9 Apr 2025 17:45:33 GMT Subject: Integrated: 8349007: The jtreg test ResolvedMethodTableHash takes excessive time In-Reply-To: References: Message-ID: <4iRSxOo17EjgmznyYwEpVFVXt8ZbmBuu1LfrOFU6pck=.852071ba-3192-4d2f-bbf6-41212b0f2095@github.com> On Wed, 2 Apr 2025 17:28:12 GMT, Coleen Phillimore wrote: > This is mostly test change. ResolvedMethodTableHash.java is run /manual but still takes a long time and doesn't really verify that the hash code is any good. This add logging, triggers concurrent work like other ConcurrentHashtables, and checks that a small table is not rehashed. > Tested with tier1 (including test). This pull request has now been integrated. Changeset: 6352ee1a Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/6352ee1a6e55e428db0eca97ecf8125770dc4a08 Stats: 80 lines in 2 files changed: 29 ins; 2 del; 49 mod 8349007: The jtreg test ResolvedMethodTableHash takes excessive time Reviewed-by: lmesnik, matsaave ------------- PR: https://git.openjdk.org/jdk/pull/24383 From erikj at openjdk.org Wed Apr 9 17:57:43 2025 From: erikj at openjdk.org (Erik Joelsson) Date: Wed, 9 Apr 2025 17:57:43 GMT Subject: RFR: 8354213: Restore pointless unicode characters to ASCII [v3] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 15:53:51 GMT, Magnus Ihse Bursie wrote: >> I believe the source code of the JDK should be in US-ASCII if possible, and only employ extended characters if that is strictly necessary for the code to work. >> >> In my attempt to figure out which non-ascii files are another encoding than utf-8 (see [JDK-8301971](https://bugs.openjdk.org/browse/JDK-8301971)), I discovered a handful of files that use unicode characters for no good reason, when normal ASCII characters could have been used (and have been used everywhere else in the code base in similar contexts). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Oops. Marked as reviewed by erikj (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24552#pullrequestreview-2754290885 From iris at openjdk.org Wed Apr 9 18:15:34 2025 From: iris at openjdk.org (Iris Clark) Date: Wed, 9 Apr 2025 18:15:34 GMT Subject: RFR: 8354213: Restore pointless unicode characters to ASCII [v3] In-Reply-To: References: Message-ID: <5_CH1LUp-76OBgk1X2EX8515DRsyt9Z2kbnSoR29RCA=.b9e48d3d-bf08-4ac1-99fe-3e86d57e1903@github.com> On Wed, 9 Apr 2025 15:53:51 GMT, Magnus Ihse Bursie wrote: >> I believe the source code of the JDK should be in US-ASCII if possible, and only employ extended characters if that is strictly necessary for the code to work. >> >> In my attempt to figure out which non-ascii files are another encoding than utf-8 (see [JDK-8301971](https://bugs.openjdk.org/browse/JDK-8301971)), I discovered a handful of files that use unicode characters for no good reason, when normal ASCII characters could have been used (and have been used everywhere else in the code base in similar contexts). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Oops. Thanks for fixing! ------------- Marked as reviewed by iris (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24552#pullrequestreview-2754330944 From shade at openjdk.org Wed Apr 9 18:34:33 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 18:34:33 GMT Subject: RFR: 8353572: x86: AMD platforms miss the check for CLWB feature flag [v2] In-Reply-To: References: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> <1UeVxSPjBdbSxxkJDo8YdTXAfIweSrH__GF-ytGwpzw=.63f3ca29-119a-4814-90a2-08518c9e9ca2@github.com> Message-ID: On Wed, 9 Apr 2025 17:37:33 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - More feature flag commonning >> - Merge branch 'master' into JDK-8353572-amd-clwb >> - Fix > > src/hotspot/cpu/x86/vm_version_x86.cpp line 3118: > >> 3116: // We do not know if these are supported by ZX, >> 3117: // so we cannot trust common CPUID bit for it. >> 3118: result &= ~CPU_CLWB; > > I'd prefer to completely drop this adjustment, but if you do want to keep it, I'd add an assert (and/or a warning?) to fire when CLWB bit is present on ZX CPUs. Added assert. Whoever maintains ZX would need to fix that code if CLWB is actually supported. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24385#discussion_r2035908038 From shade at openjdk.org Wed Apr 9 18:34:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Apr 2025 18:34:30 GMT Subject: RFR: 8353572: x86: AMD platforms miss the check for CLWB feature flag [v3] In-Reply-To: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> References: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> Message-ID: > Noticed this when doing [JDK-8353558](https://bugs.openjdk.org/browse/JDK-8353558). We only check for CLWB feature flag for Intel platforms. But AMD APM (Rev. 3.36?March 2024) tells me there is a CLWB flag in CPUID Fn0000_0007_EBX_x0 leaf as well. It is in the same place as the flag for Intel. > > Additional testing: > - [x] Ad-hoc tests on Ryzen 5950X Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Add assert - Merge branch 'master' into JDK-8353572-amd-clwb - More feature flag commonning - Merge branch 'master' into JDK-8353572-amd-clwb - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24385/files - new: https://git.openjdk.org/jdk/pull/24385/files/c9ec23c3..2879eaf7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24385&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24385&range=01-02 Stats: 2035 lines in 59 files changed: 638 ins; 848 del; 549 mod Patch: https://git.openjdk.org/jdk/pull/24385.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24385/head:pull/24385 PR: https://git.openjdk.org/jdk/pull/24385 From sviswanathan at openjdk.org Wed Apr 9 18:42:45 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 9 Apr 2025 18:42:45 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v14] In-Reply-To: References: <394Wf5RpbwUgE7zBaZBnwa2YAxQFwWDhF1VuaMPHdhE=.98ff29f7-b6a7-49eb-bdd6-8489568b24b7@github.com> <-W1vBCTLtPyOZNm6XhHQXT9spBbkAd4Z4rTn_LHH1Aw=.5beae719-ac8b-404a-a34c-deecfc97dd7e@github.com> Message-ID: On Wed, 9 Apr 2025 17:09:09 GMT, Ferenc Rakoczi wrote: >> Overall very clean and nicely done PR. Thanks a lot for considering my inputs. > >> Overall very clean and nicely done PR. Thanks a lot for considering my inputs. > > That is in no small part thanks to the reviewers, especially to Volodymyr! > @lmesnik, @jatin-bhateja, @sviswa7 would one of you /sponsor me with the integration? @ferakocz Once you do /integrate, I will be honored to sponsor your PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23860#issuecomment-2790618572 From wkemper at openjdk.org Wed Apr 9 18:47:45 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 9 Apr 2025 18:47:45 GMT Subject: RFR: 8351157: Clean up x86 GC barriers after 32-bit x86 removal [v3] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:31:44 GMT, Aleksey Shipilev wrote: >> Assembler GC barriers have quite a bit of coding to support 32-bit x86. As 32-bit x86 is removed, we can clean up those parts. >> >> We can eliminate `!LP64` blocks quite easily. We can also prune passing around `thread` argument, and just trust that `r15_thread` is always available. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8351157-x86-gc-barriers > - Merge branch 'master' into JDK-8351157-x86-gc-barriers > - Also do tlab_allocate > - Rely on R15 to be a thread register > - Work This is a nice simplification. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24253#pullrequestreview-2754423734 From vlivanov at openjdk.org Wed Apr 9 18:56:51 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 9 Apr 2025 18:56:51 GMT Subject: RFR: 8353572: x86: AMD platforms miss the check for CLWB feature flag [v3] In-Reply-To: References: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> Message-ID: <-C2sK1OKqDweiOtMX9arvuAlK2U1o_-9blepRaRQMA8=.28612389-eaf8-4c04-9bed-a5f78d0f0429@github.com> On Wed, 9 Apr 2025 18:34:30 GMT, Aleksey Shipilev wrote: >> Noticed this when doing [JDK-8353558](https://bugs.openjdk.org/browse/JDK-8353558). We only check for CLWB feature flag for Intel platforms. But AMD APM (Rev. 3.36?March 2024) tells me there is a CLWB flag in CPUID Fn0000_0007_EBX_x0 leaf as well. It is in the same place as the flag for Intel. >> >> Additional testing: >> - [x] Ad-hoc tests on Ryzen 5950X > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add assert > - Merge branch 'master' into JDK-8353572-amd-clwb > - More feature flag commonning > - Merge branch 'master' into JDK-8353572-amd-clwb > - Fix Thanks. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24385#pullrequestreview-2754450832 From coleenp at openjdk.org Wed Apr 9 19:18:47 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 9 Apr 2025 19:18:47 GMT Subject: RFR: 8354180: Clean up uses of ObjectMonitor caches Message-ID: This is mostly changes from @xmas92 as explained to me plus small cleanup to a read_caches function. Tested with tier1-4. ------------- Commit messages: - Add back weird friend, going to deal with that later. - Revert complete_monitor_locking_C. - Conditionally update OMCache in CacheSetter, also Axel's improvement. - Axel comments and improvements and weird friend. - Axel's comments and new assert. - Print raw owner for ObjectMonitor - Some cache setter cleanups. will revert sharedRuntime that was just an experiment. Changes: https://git.openjdk.org/jdk/pull/24545/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24545&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354180 Stats: 59 lines in 6 files changed: 26 ins; 18 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/24545.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24545/head:pull/24545 PR: https://git.openjdk.org/jdk/pull/24545 From coleenp at openjdk.org Wed Apr 9 19:24:25 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 9 Apr 2025 19:24:25 GMT Subject: RFR: 8352773: JVMTI should disable events during java upcalls In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:14:04 GMT, Serguei Spitsyn wrote: > As noted in [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088), JVMTI `GetThreadGroupChildren` does an upcall to java. This results in a`ClassPrepare` event the first time it does this, and these events can cause problems (deadlocks) for the debugger or debug agent. The [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088) was fixed to get rid of class loading during Java upcall from `GetThreadGroupChildren`. However, some other events can be generated as well. It is more safe to disable all JVMTI events during debugger-related upcalls originated by JVMTI. > The `ClassPrepare` events are important for the debug agent. So, an assert was added into `ClassPrepare` event generation to make sure there are no attempts to post this event during upcalls. > Some specific implementation details can be added to the first PR comment. > > Testing: > - Verified with the test `jdk/com/sun/jdi/EarlyThreadGroupChildrenTest.java` that was added with the fix of [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088): > - the assert described above is fired if the fix of JDK-8352088 is removed > - the test is passed without if the fix of JDK-8352088 is removed and the assert is removed > - Ran mach5 tiers 1-6 So this is another case where you have to ignore JVMTI event like in VTMS transitions? It looks like a good way to fix this in general. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24539#pullrequestreview-2754548627 From duke at openjdk.org Wed Apr 9 19:33:37 2025 From: duke at openjdk.org (duke) Date: Wed, 9 Apr 2025 19:33:37 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v14] In-Reply-To: <394Wf5RpbwUgE7zBaZBnwa2YAxQFwWDhF1VuaMPHdhE=.98ff29f7-b6a7-49eb-bdd6-8489568b24b7@github.com> References: <394Wf5RpbwUgE7zBaZBnwa2YAxQFwWDhF1VuaMPHdhE=.98ff29f7-b6a7-49eb-bdd6-8489568b24b7@github.com> Message-ID: On Tue, 8 Apr 2025 21:27:08 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Reacting to mor comments from Sandhya. @ferakocz Your change (at version 0b0d0969d6ac629bf2ca997d2286c4d28f91c1b9) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23860#issuecomment-2790791121 From duke at openjdk.org Wed Apr 9 19:33:35 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 9 Apr 2025 19:33:35 GMT Subject: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v14] In-Reply-To: References: <394Wf5RpbwUgE7zBaZBnwa2YAxQFwWDhF1VuaMPHdhE=.98ff29f7-b6a7-49eb-bdd6-8489568b24b7@github.com> <-W1vBCTLtPyOZNm6XhHQXT9spBbkAd4Z4rTn_LHH1Aw=.5beae719-ac8b-404a-a34c-deecfc97dd7e@github.com> Message-ID: On Wed, 9 Apr 2025 17:09:09 GMT, Ferenc Rakoczi wrote: >> Overall very clean and nicely done PR. Thanks a lot for considering my inputs. > >> Overall very clean and nicely done PR. Thanks a lot for considering my inputs. > > That is in no small part thanks to the reviewers, especially to Volodymyr! > @lmesnik, @jatin-bhateja, @sviswa7 would one of you /sponsor me with the integration? > @ferakocz Once you do /integrate, I will be honored to sponsor your PR. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23860#issuecomment-2790788483 From lmesnik at openjdk.org Wed Apr 9 19:41:28 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 9 Apr 2025 19:41:28 GMT Subject: RFR: 8352773: JVMTI should disable events during java upcalls In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:14:04 GMT, Serguei Spitsyn wrote: > As noted in [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088), JVMTI `GetThreadGroupChildren` does an upcall to java. This results in a`ClassPrepare` event the first time it does this, and these events can cause problems (deadlocks) for the debugger or debug agent. The [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088) was fixed to get rid of class loading during Java upcall from `GetThreadGroupChildren`. However, some other events can be generated as well. It is more safe to disable all JVMTI events during debugger-related upcalls originated by JVMTI. > The `ClassPrepare` events are important for the debug agent. So, an assert was added into `ClassPrepare` event generation to make sure there are no attempts to post this event during upcalls. > Some specific implementation details can be added to the first PR comment. > > Testing: > - Verified with the test `jdk/com/sun/jdi/EarlyThreadGroupChildrenTest.java` that was added with the fix of [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088): > - the assert described above is fired if the fix of JDK-8352088 is removed > - the test is passed without if the fix of JDK-8352088 is removed and the assert is removed > - Ran mach5 tiers 1-6 Looks good. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24539#pullrequestreview-2754601842 From ihse at openjdk.org Wed Apr 9 20:16:35 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 9 Apr 2025 20:16:35 GMT Subject: RFR: 8354213: Restore pointless unicode characters to ASCII [v3] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 17:22:04 GMT, Naoto Sato wrote: > never expected zero-width space, or Cyrillic "C" in place for ascii "C" in the code! Yes, these got some extra bonus points. :-) I'm also curious what the almost-comma is supposed to be, but never bothered to look it up. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24552#issuecomment-2790892094 From ihse at openjdk.org Wed Apr 9 20:16:36 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 9 Apr 2025 20:16:36 GMT Subject: Integrated: 8354213: Restore pointless unicode characters to ASCII In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 15:17:50 GMT, Magnus Ihse Bursie wrote: > I believe the source code of the JDK should be in US-ASCII if possible, and only employ extended characters if that is strictly necessary for the code to work. > > In my attempt to figure out which non-ascii files are another encoding than utf-8 (see [JDK-8301971](https://bugs.openjdk.org/browse/JDK-8301971)), I discovered a handful of files that use unicode characters for no good reason, when normal ASCII characters could have been used (and have been used everywhere else in the code base in similar contexts). This pull request has now been integrated. Changeset: 4a242e3a Author: Magnus Ihse Bursie URL: https://git.openjdk.org/jdk/commit/4a242e3a65f13c41c699d42b100ba2b252d7faaa Stats: 45 lines in 22 files changed: 0 ins; 0 del; 45 mod 8354213: Restore pointless unicode characters to ASCII Reviewed-by: naoto, erikj, iris ------------- PR: https://git.openjdk.org/jdk/pull/24552 From gziemski at openjdk.org Wed Apr 9 20:35:32 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 9 Apr 2025 20:35:32 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v9] In-Reply-To: References: Message-ID: <8KEGhJ5sXoeeT2ezqvyG-uYWlXUzBGSHD_RLwjAH8LI=.89670a1f-2e4a-4c88-8329-3261d462cae0@github.com> On Mon, 7 Apr 2025 13:30:44 GMT, Gerard Ziemski wrote: >> This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. >> >> I tried to fill in tag, when I was pretty certain that I had the right type. >> >> At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > small last feedback from Stefan Thank you Stefan for providing the values of mem_tags and your feedback. Do you want to be a co-author on this PR? @jdksjolen @afshin-zafari ping ------------- PR Comment: https://git.openjdk.org/jdk/pull/24282#issuecomment-2790934209 PR Comment: https://git.openjdk.org/jdk/pull/24282#issuecomment-2790935906 From cjplummer at openjdk.org Wed Apr 9 20:45:32 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 9 Apr 2025 20:45:32 GMT Subject: RFR: 8352773: JVMTI should disable events during java upcalls In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:14:04 GMT, Serguei Spitsyn wrote: > As noted in [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088), JVMTI `GetThreadGroupChildren` does an upcall to java. This results in a`ClassPrepare` event the first time it does this, and these events can cause problems (deadlocks) for the debugger or debug agent. The [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088) was fixed to get rid of class loading during Java upcall from `GetThreadGroupChildren`. However, some other events can be generated as well. It is more safe to disable all JVMTI events during debugger-related upcalls originated by JVMTI. > The `ClassPrepare` events are important for the debug agent. So, an assert was added into `ClassPrepare` event generation to make sure there are no attempts to post this event during upcalls. > Some specific implementation details can be added to the first PR comment. > > Testing: > - Verified with the test `jdk/com/sun/jdi/EarlyThreadGroupChildrenTest.java` that was added with the fix of [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088): > - the assert described above is fired if the fix of JDK-8352088 is removed > - the test is passed without if the fix of JDK-8352088 is removed and the assert is removed > - Ran mach5 tiers 1-6 src/hotspot/share/prims/jvmtiEnvBase.cpp line 867: > 865: // This call collects the strong and weak groups > 866: JavaThread* THREAD = current_thread; > 867: JvmtiJavaUpcallMark jjum(current_thread); Add comment like you did above for the JvmtiEnv::InterruptThread case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24539#discussion_r2036109049 From rrich at openjdk.org Wed Apr 9 20:52:34 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 9 Apr 2025 20:52:34 GMT Subject: RFR: 8351666: [PPC64] Make non-volatile VectorRegisters available for C2 register allocation [v4] In-Reply-To: References: Message-ID: <66CxpgDkdMjhIrTcz59yakF1YhB4CE-Uw711KWbUM40=.756e61e1-6e6a-4a1f-b660-dc5e0f18d052@github.com> On Tue, 8 Apr 2025 16:08:32 GMT, Martin Doerr wrote: >> This PR makes the non-volatile VectorRegisters available for C2's register allocation. >> >> I had to implement the VectorRegisters properly (4 VM Regs) like on other platforms. The old version has run into assertions and looked strange. >> >> The non-volatile VectorRegisters are now saved when entering Java: call_stub and upcall_stubs. >> I have rewritten the save and restore functions and used them for both. Then, I have removed code which has become dead. I only save and restore them if C2 uses the vector instructions (controlled by `SuperwordUseVSX`). >> I have moved the non-volatile spill area out of the entry_frame_locals because it has a variable size, now. >> >> The stack area for all non-volatile registers has become larger than the 288 Bytes which are allowed to be used below the SP (specified by the ABI). Therefore, I had to rewrite the call_stub sequence significantly. We need to push the new frame before saving the registers, now. >> >> Saving and restoring the FP registers is not needed in the slow signature handler which also uses the save and restore code for non-volatile registers. >> >> On Power10, we use vector pair instructions since Commit 8. E.g. in the call stub: >> >> 0x000072c9483c07b4: stxvp vs52,-224(r2) >> 0x000072c9483c07b8: stxvp vs54,-192(r2) >> 0x000072c9483c07bc: stxvp vs56,-160(r2) >> 0x000072c9483c07c0: stxvp vs58,-128(r2) >> 0x000072c9483c07c4: stxvp vs60,-96(r2) >> 0x000072c9483c07c8: stxvp vs62,-64(r2) >> >> >> >> 0x000072c9483c0914: lxvp vs52,-224(r2) >> 0x000072c9483c0918: lxvp vs54,-192(r2) >> 0x000072c9483c091c: lxvp vs56,-160(r2) >> 0x000072c9483c0920: lxvp vs58,-128(r2) >> 0x000072c9483c0924: lxvp vs60,-96(r2) >> 0x000072c9483c0928: lxvp vs62,-64(r2) > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright header. Hi Martin, I do have a couple of questions and comments. Still very nice! Cheers, Richard. src/hotspot/cpu/ppc/ppc.ad line 261: > 259: // ---------------------------- > 260: // 1st 32 VSRs are aliases for the FPRs which are already defined above. > 261: reg_def VSR0 (SOC, SOC, Op_VecX, 0, VMRegImpl::Bad()); I wonder how the old reg_defs worked, e.g. when allocating spill slots. Do you know? BTW: You might use vector pair load/stores in `MachSpillCopyNode::implementation()` too. src/hotspot/cpu/ppc/ppc.ad line 1934: > 1932: if (reg < 136+256) return rc_vs; > 1933: > 1934: assert(OptoReg::is_stack(reg), "what else is it?"); Maybe add ```c++ assert(_last_Mach_Reg == 398, "hardcoded register indices need to be updated from enum MachRegisterNumbers"); src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 141: > 139: __ rldicr(r_frame_size, r_frame_size, 3, 63 - 4); > 140: > 141: // this is the pure space for arguments Suggestion: // this is the pure space for arguments (excluding alignment padding) src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 150: > 148: __ push_frame(r_frame_size, R0); > 149: > 150: // Save non-volatiles GPRs to ENTRY_FRAME (not yet pushed, but it's safe). Frame's already pushed. Suggestion: // Save non-volatiles registers to ENTRY_FRAME. src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 220: > 218: BLOCK_COMMENT("Call frame manager or native entry."); > 219: // Call frame manager or native entry. > 220: assert_different_registers(r_arg_entry, r_top_of_arguments_addr, r_arg_method, r_arg_thread); Since you're at it: please adjust the 2 comment lines above: there's no frame manager. We're about to call the interpreter or native entry. Also L222 (just remove "on entry ..."), L245. L265 src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 40: > 38: static void preserve_callee_saved_registers(MacroAssembler* _masm, const ABIDescriptor& abi, int reg_save_area_offset) { > 39: __ block_comment("{ preserve_callee_saved_regs "); > 40: __ save_nonvolatile_registers(R1_SP, reg_save_area_offset, true, SuperwordUseVSX); Parameter `abi` isn't used anymore. `preserve_callee_saved_registers` has just one call site. I think you should call `save_nonvolatile_registers` directly and delete this method. `block_comment` can be moved to `save_nonvolatile_registers`. src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 46: > 44: static void restore_callee_saved_registers(MacroAssembler* _masm, const ABIDescriptor& abi, int reg_save_area_offset) { > 45: __ block_comment("{ restore_callee_saved_regs "); > 46: __ restore_nonvolatile_registers(R1_SP, reg_save_area_offset, true, SuperwordUseVSX); See comment on `preserve_callee_saved_registers` ------------- PR Review: https://git.openjdk.org/jdk/pull/23987#pullrequestreview-2749247399 PR Review Comment: https://git.openjdk.org/jdk/pull/23987#discussion_r2035094650 PR Review Comment: https://git.openjdk.org/jdk/pull/23987#discussion_r2035073880 PR Review Comment: https://git.openjdk.org/jdk/pull/23987#discussion_r2035684416 PR Review Comment: https://git.openjdk.org/jdk/pull/23987#discussion_r2035685691 PR Review Comment: https://git.openjdk.org/jdk/pull/23987#discussion_r2035807151 PR Review Comment: https://git.openjdk.org/jdk/pull/23987#discussion_r2034626166 PR Review Comment: https://git.openjdk.org/jdk/pull/23987#discussion_r2034627304 From rrich at openjdk.org Wed Apr 9 20:52:35 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 9 Apr 2025 20:52:35 GMT Subject: RFR: 8351666: [PPC64] Make non-volatile VectorRegisters available for C2 register allocation [v4] In-Reply-To: <66CxpgDkdMjhIrTcz59yakF1YhB4CE-Uw711KWbUM40=.756e61e1-6e6a-4a1f-b660-dc5e0f18d052@github.com> References: <66CxpgDkdMjhIrTcz59yakF1YhB4CE-Uw711KWbUM40=.756e61e1-6e6a-4a1f-b660-dc5e0f18d052@github.com> Message-ID: On Wed, 9 Apr 2025 10:40:16 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Update Copyright header. > > src/hotspot/cpu/ppc/ppc.ad line 261: > >> 259: // ---------------------------- >> 260: // 1st 32 VSRs are aliases for the FPRs which are already defined above. >> 261: reg_def VSR0 (SOC, SOC, Op_VecX, 0, VMRegImpl::Bad()); > > I wonder how the old reg_defs worked, e.g. when allocating spill slots. Do you know? > BTW: You might use vector pair load/stores in `MachSpillCopyNode::implementation()` too. Ah, I see: it depends on Op_VecX. Op_VecX has 4 slots: https://github.com/openjdk/jdk/blob/7aeaa3c21c1420191fe8ff59e4cf99eae830754d/src/hotspot/share/opto/regmask.hpp#L90-L110 Hm, is it really necessary to model the vector registers as 4 32-bit parts? As you said offline this makes the `RegisterMasks` larger. If so then shouldn't Op_VecS be used? > src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 220: > >> 218: BLOCK_COMMENT("Call frame manager or native entry."); >> 219: // Call frame manager or native entry. >> 220: assert_different_registers(r_arg_entry, r_top_of_arguments_addr, r_arg_method, r_arg_thread); > > Since you're at it: please adjust the 2 comment lines above: there's no frame manager. We're about to call the interpreter or native entry. > Also L222 (just remove "on entry ..."), L245. L265 Alternatively create a cleanup RFE for the stale "frame manager" comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23987#discussion_r2035153598 PR Review Comment: https://git.openjdk.org/jdk/pull/23987#discussion_r2035814736 From rrich at openjdk.org Wed Apr 9 20:52:35 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 9 Apr 2025 20:52:35 GMT Subject: RFR: 8351666: [PPC64] Make non-volatile VectorRegisters available for C2 register allocation [v4] In-Reply-To: References: <66CxpgDkdMjhIrTcz59yakF1YhB4CE-Uw711KWbUM40=.756e61e1-6e6a-4a1f-b660-dc5e0f18d052@github.com> Message-ID: On Wed, 9 Apr 2025 11:19:17 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/ppc/ppc.ad line 261: >> >>> 259: // ---------------------------- >>> 260: // 1st 32 VSRs are aliases for the FPRs which are already defined above. >>> 261: reg_def VSR0 (SOC, SOC, Op_VecX, 0, VMRegImpl::Bad()); >> >> I wonder how the old reg_defs worked, e.g. when allocating spill slots. Do you know? >> BTW: You might use vector pair load/stores in `MachSpillCopyNode::implementation()` too. > > Ah, I see: it depends on Op_VecX. Op_VecX has 4 slots: https://github.com/openjdk/jdk/blob/7aeaa3c21c1420191fe8ff59e4cf99eae830754d/src/hotspot/share/opto/regmask.hpp#L90-L110 > > Hm, is it really necessary to model the vector registers as 4 32-bit parts? As you said offline this makes the `RegisterMasks` larger. If so then shouldn't Op_VecS be used? Also: why is it even necessary to define VSR0 - VSR31 if we don't use them (because they are aliases of the FP regs)? I assume they unnecessaryly enlarge RegMasks and the size of RegMasks is critical for memory consumtion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23987#discussion_r2035548844 From cjplummer at openjdk.org Wed Apr 9 20:52:48 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 9 Apr 2025 20:52:48 GMT Subject: RFR: 8352773: JVMTI should disable events during java upcalls In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:14:04 GMT, Serguei Spitsyn wrote: > As noted in [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088), JVMTI `GetThreadGroupChildren` does an upcall to java. This results in a`ClassPrepare` event the first time it does this, and these events can cause problems (deadlocks) for the debugger or debug agent. The [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088) was fixed to get rid of class loading during Java upcall from `GetThreadGroupChildren`. However, some other events can be generated as well. It is more safe to disable all JVMTI events during debugger-related upcalls originated by JVMTI. > The `ClassPrepare` events are important for the debug agent. So, an assert was added into `ClassPrepare` event generation to make sure there are no attempts to post this event during upcalls. > Some specific implementation details can be added to the first PR comment. > > Testing: > - Verified with the test `jdk/com/sun/jdi/EarlyThreadGroupChildrenTest.java` that was added with the fix of [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088): > - the assert described above is fired if the fix of JDK-8352088 is removed > - the test is passed without if the fix of JDK-8352088 is removed and the assert is removed > - Ran mach5 tiers 1-6 src/hotspot/share/prims/jvmtiExport.cpp line 1419: > 1417: // ClassPrepare events are important for JDWP agent but not expected during such upcalls. > 1418: // Catch if this invariant is not broken. > 1419: assert(!thread->is_in_java_upcall(), "unexpected ClassPrepare event during JVMTI upcall"); I think we should do this for ClassLoad also. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24539#discussion_r2036122548 From iklam at openjdk.org Wed Apr 9 21:00:44 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 9 Apr 2025 21:00:44 GMT Subject: RFR: 8351319: AOT cache support for custom class loaders broken since JDK-8348426 [v7] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 16:49:03 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed merge > > Updates look good. Thanks @calvinccheung @matias9927 @rose00 for the review ------------- PR Comment: https://git.openjdk.org/jdk/pull/23926#issuecomment-2790979126 From iklam at openjdk.org Wed Apr 9 21:00:45 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 9 Apr 2025 21:00:45 GMT Subject: Integrated: 8351319: AOT cache support for custom class loaders broken since JDK-8348426 In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 04:09:22 GMT, Ioi Lam wrote: > Since [JDK-8348426](https://bugs.openjdk.org/browse/JDK-8348426) (Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file), the AOT cache no longer contains classes intended for custom class loaders (these are called "unregistered classes" in CDS terminology). > > The fix is simple -- we already remember the set of unregistered classes in the AOT configuration file. We just need to add them into the final AOT cache (see changes in finalImageRecipes.cpp). This pull request has now been integrated. Changeset: e3f26b05 Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/e3f26b056e6b8403e6744b8a4cf59ccf4d217d89 Stats: 185 lines in 13 files changed: 151 ins; 0 del; 34 mod 8351319: AOT cache support for custom class loaders broken since JDK-8348426 Reviewed-by: ccheung, matsaave, jrose ------------- PR: https://git.openjdk.org/jdk/pull/23926 From duke at openjdk.org Wed Apr 9 21:18:35 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 9 Apr 2025 21:18:35 GMT Subject: Integrated: 8351034: Add AVX-512 intrinsics for ML-DSA In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 11:12:58 GMT, Ferenc Rakoczi wrote: > By using the AVX-512 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. This pull request has now been integrated. Changeset: e87ff328 Author: Ferenc Rakoczi Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/e87ff328d5cc66454213dee44cf2faeb0e76262f Stats: 1307 lines in 10 files changed: 1265 ins; 27 del; 15 mod 8351034: Add AVX-512 intrinsics for ML-DSA Reviewed-by: sviswanathan, lmesnik, vpaprotski, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/23860 From mdoerr at openjdk.org Wed Apr 9 22:04:30 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 9 Apr 2025 22:04:30 GMT Subject: RFR: 8351666: [PPC64] Make non-volatile VectorRegisters available for C2 register allocation [v4] In-Reply-To: References: <66CxpgDkdMjhIrTcz59yakF1YhB4CE-Uw711KWbUM40=.756e61e1-6e6a-4a1f-b660-dc5e0f18d052@github.com> Message-ID: On Wed, 9 Apr 2025 14:49:55 GMT, Richard Reingruber wrote: >> Ah, I see: it depends on Op_VecX. Op_VecX has 4 slots: https://github.com/openjdk/jdk/blob/7aeaa3c21c1420191fe8ff59e4cf99eae830754d/src/hotspot/share/opto/regmask.hpp#L90-L110 >> >> Hm, is it really necessary to model the vector registers as 4 32-bit parts? As you said offline this makes the `RegisterMasks` larger. If so then shouldn't Op_VecS be used? > > Also: why is it even necessary to define VSR0 - VSR31 if we don't use them (because they are aliases of the FP regs)? I assume they unnecessaryly enlarge RegMasks and the size of RegMasks is critical for memory consumtion. I've removed them. This makes sense. Thanks for your feedback! I'll take a closer look at the other comments when I find more time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23987#discussion_r2036198671 From mdoerr at openjdk.org Wed Apr 9 22:08:17 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 9 Apr 2025 22:08:17 GMT Subject: RFR: 8351666: [PPC64] Make non-volatile VectorRegisters available for C2 register allocation [v5] In-Reply-To: References: Message-ID: > This PR makes the non-volatile VectorRegisters available for C2's register allocation. > > I had to implement the VectorRegisters properly (4 VM Regs) like on other platforms. The old version has run into assertions and looked strange. > > The non-volatile VectorRegisters are now saved when entering Java: call_stub and upcall_stubs. > I have rewritten the save and restore functions and used them for both. Then, I have removed code which has become dead. I only save and restore them if C2 uses the vector instructions (controlled by `SuperwordUseVSX`). > I have moved the non-volatile spill area out of the entry_frame_locals because it has a variable size, now. > > The stack area for all non-volatile registers has become larger than the 288 Bytes which are allowed to be used below the SP (specified by the ABI). Therefore, I had to rewrite the call_stub sequence significantly. We need to push the new frame before saving the registers, now. > > Saving and restoring the FP registers is not needed in the slow signature handler which also uses the save and restore code for non-volatile registers. > > On Power10, we use vector pair instructions since Commit 8. E.g. in the call stub: > > 0x000072c9483c07b4: stxvp vs52,-224(r2) > 0x000072c9483c07b8: stxvp vs54,-192(r2) > 0x000072c9483c07bc: stxvp vs56,-160(r2) > 0x000072c9483c07c0: stxvp vs58,-128(r2) > 0x000072c9483c07c4: stxvp vs60,-96(r2) > 0x000072c9483c07c8: stxvp vs62,-64(r2) > > > > 0x000072c9483c0914: lxvp vs52,-224(r2) > 0x000072c9483c0918: lxvp vs54,-192(r2) > 0x000072c9483c091c: lxvp vs56,-160(r2) > 0x000072c9483c0920: lxvp vs58,-128(r2) > 0x000072c9483c0924: lxvp vs60,-96(r2) > 0x000072c9483c0928: lxvp vs62,-64(r2) Martin Doerr has updated the pull request incrementally with two additional commits since the last revision: - Remove dead code. - Remove VSR0-31 from allocation classes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23987/files - new: https://git.openjdk.org/jdk/pull/23987/files/e69d7183..8142a36e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23987&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23987&range=03-04 Stats: 235 lines in 1 file changed: 1 ins; 232 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23987.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23987/head:pull/23987 PR: https://git.openjdk.org/jdk/pull/23987 From mdoerr at openjdk.org Wed Apr 9 22:26:31 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 9 Apr 2025 22:26:31 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 08:10:34 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: > > - * missing file from merge > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq > - * make young gen length revising independent of refinement thread > * use a service task > * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update > - * fix IR code generation tests that change due to barrier cost changes > - * factor out card table and refinement table merging into a single > method > - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 > - * obsolete G1UpdateBufferSize > > G1UpdateBufferSize has previously been used to size the refinement > buffers and impose a minimum limit on the number of cards per thread > that need to be pending before refinement starts. > > The former function is now obsolete with the removal of the dirty > card queues, the latter functionality has been taken over by the new > diagnostic option `G1PerThreadPendingCardThreshold`. > > I prefer to make this a diagnostic option is better than a product option > because it is something that is only necessary for some test cases to > produce some otherwise unwanted behavior (continuous refinement). > > CSR is pending. > - ... and 29 more: https://git.openjdk.org/jdk/compare/41d4a0d7...1c5a669f This PR needs an update for x86 platforms when merging: g1BarrierSetAssembler_x86.cpp:117:6: error: 'class MacroAssembler' has no member named 'get_thread' ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2791114662 From sspitsyn at openjdk.org Wed Apr 9 22:37:27 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 9 Apr 2025 22:37:27 GMT Subject: RFR: 8352773: JVMTI should disable events during java upcalls In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 20:38:08 GMT, Chris Plummer wrote: >> As noted in [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088), JVMTI `GetThreadGroupChildren` does an upcall to java. This results in a`ClassPrepare` event the first time it does this, and these events can cause problems (deadlocks) for the debugger or debug agent. The [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088) was fixed to get rid of class loading during Java upcall from `GetThreadGroupChildren`. However, some other events can be generated as well. It is more safe to disable all JVMTI events during debugger-related upcalls originated by JVMTI. >> The `ClassPrepare` events are important for the debug agent. So, an assert was added into `ClassPrepare` event generation to make sure there are no attempts to post this event during upcalls. >> Some specific implementation details can be added to the first PR comment. >> >> Testing: >> - Verified with the test `jdk/com/sun/jdi/EarlyThreadGroupChildrenTest.java` that was added with the fix of [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088): >> - the assert described above is fired if the fix of JDK-8352088 is removed >> - the test is passed without if the fix of JDK-8352088 is removed and the assert is removed >> - Ran mach5 tiers 1-6 > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 867: > >> 865: // This call collects the strong and weak groups >> 866: JavaThread* THREAD = current_thread; >> 867: JvmtiJavaUpcallMark jjum(current_thread); > > Add comment like you did above for the JvmtiEnv::InterruptThread case. Okay, thanks! Added now. > src/hotspot/share/prims/jvmtiExport.cpp line 1419: > >> 1417: // ClassPrepare events are important for JDWP agent but not expected during such upcalls. >> 1418: // Catch if this invariant is not broken. >> 1419: assert(!thread->is_in_java_upcall(), "unexpected ClassPrepare event during JVMTI upcall"); > > I think we should do this for ClassLoad also. Okay, thanks! Updated now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24539#discussion_r2036227345 PR Review Comment: https://git.openjdk.org/jdk/pull/24539#discussion_r2036227287 From sspitsyn at openjdk.org Wed Apr 9 22:41:05 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 9 Apr 2025 22:41:05 GMT Subject: RFR: 8352773: JVMTI should disable events during java upcalls [v2] In-Reply-To: References: Message-ID: > As noted in [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088), JVMTI `GetThreadGroupChildren` does an upcall to java. This results in a`ClassPrepare` event the first time it does this, and these events can cause problems (deadlocks) for the debugger or debug agent. The [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088) was fixed to get rid of class loading during Java upcall from `GetThreadGroupChildren`. However, some other events can be generated as well. It is more safe to disable all JVMTI events during debugger-related upcalls originated by JVMTI. > The `ClassPrepare` events are important for the debug agent. So, an assert was added into `ClassPrepare` event generation to make sure there are no attempts to post this event during upcalls. > Some specific implementation details can be added to the first PR comment. > > Testing: > - Verified with the test `jdk/com/sun/jdi/EarlyThreadGroupChildrenTest.java` that was added with the fix of [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088): > - the assert described above is fired if the fix of JDK-8352088 is removed > - the test is passed without if the fix of JDK-8352088 is removed and the assert is removed > - Ran mach5 tiers 1-6 Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: 1) added an assert for ClassLoad events same as for ClassPrepare; 2) added one minor comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24539/files - new: https://git.openjdk.org/jdk/pull/24539/files/82e99fae..342e8a78 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24539&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24539&range=00-01 Stats: 5 lines in 2 files changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24539/head:pull/24539 PR: https://git.openjdk.org/jdk/pull/24539 From sspitsyn at openjdk.org Wed Apr 9 22:41:05 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 9 Apr 2025 22:41:05 GMT Subject: RFR: 8352773: JVMTI should disable events during java upcalls In-Reply-To: References: Message-ID: <-h8tEhjnCVtV4HP9CvhZZEnYUvyJyehZ8aZV44m1bD4=.9db1813e-225e-4352-b09f-2042fdf8fbdf@github.com> On Wed, 9 Apr 2025 08:14:04 GMT, Serguei Spitsyn wrote: > As noted in [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088), JVMTI `GetThreadGroupChildren` does an upcall to java. This results in a`ClassPrepare` event the first time it does this, and these events can cause problems (deadlocks) for the debugger or debug agent. The [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088) was fixed to get rid of class loading during Java upcall from `GetThreadGroupChildren`. However, some other events can be generated as well. It is more safe to disable all JVMTI events during debugger-related upcalls originated by JVMTI. > The `ClassPrepare` events are important for the debug agent. So, an assert was added into `ClassPrepare` event generation to make sure there are no attempts to post this event during upcalls. > Some specific implementation details can be added to the first PR comment. > > Testing: > - Verified with the test `jdk/com/sun/jdi/EarlyThreadGroupChildrenTest.java` that was added with the fix of [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088): > - the assert described above is fired if the fix of JDK-8352088 is removed > - the test is passed without if the fix of JDK-8352088 is removed and the assert is removed > - Ran mach5 tiers 1-6 Coleen and Leonid, thank you for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24539#issuecomment-2791132661 From sspitsyn at openjdk.org Wed Apr 9 22:47:24 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 9 Apr 2025 22:47:24 GMT Subject: RFR: 8352773: JVMTI should disable events during java upcalls [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 19:21:21 GMT, Coleen Phillimore wrote: > So this is another case where you have to ignore JVMTI event like in VTMS transitions? It looks like a good way to fix this in general. Yes. This is a long standing issue which is good to fix now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24539#issuecomment-2791140522 From cjplummer at openjdk.org Wed Apr 9 23:29:33 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 9 Apr 2025 23:29:33 GMT Subject: RFR: 8352773: JVMTI should disable events during java upcalls [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 22:34:47 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiExport.cpp line 1419: >> >>> 1417: // ClassPrepare events are important for JDWP agent but not expected during such upcalls. >>> 1418: // Catch if this invariant is not broken. >>> 1419: assert(!thread->is_in_java_upcall(), "unexpected ClassPrepare event during JVMTI upcall"); >> >> I think we should do this for ClassLoad also. > > Okay, thanks! Updated now. > Catch if this invariant is not broken I think you mean "Catch if this invariant is broken" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24539#discussion_r2036269670 From fyang at openjdk.org Thu Apr 10 02:16:36 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 10 Apr 2025 02:16:36 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user [v3] In-Reply-To: <5sujqD7L_cmLUyDwYb4PhgOlEeiFwlkAV7RJoVMFTrM=.223437cd-bbb2-4ef3-a6fe-b13ce402e14b@github.com> References: <5sujqD7L_cmLUyDwYb4PhgOlEeiFwlkAV7RJoVMFTrM=.223437cd-bbb2-4ef3-a6fe-b13ce402e14b@github.com> Message-ID: On Mon, 31 Mar 2025 10:45:54 GMT, Robbin Ehn wrote: >> Hi, for you to consider. >> >> These tests constantly fails in qemu-user. >> Either the require host to be same arch explicit or implicit (sysroot). >> E.g. "ptrace(PTRACE_ATTACH, ..) failed for 405157: Function not implemented'" for SA tests. >> >> From bug: >>> qemu-user/rv64 sets uarch to "qemu" in /proc/cpuinfo (qemu-system do not do that). >>> We add this uarch to CPU feature string. >>> This means we can use jtreg 'require' with cpu string to filter out tests in qemu-user. >> >> Relevant qemu code: >> https://github.com/qemu/qemu/blob/170825d14d88a1ce7fae98d5a928480f2f329b22/linux-user/riscv/target_proc.h#L29 >> >> Relevant hotspot code: >> https://github.com/openjdk/jdk/blob/fa0b18bfde38ee2ffbab33a9eaac547fe8aa3c7c/src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp#L250 >> >> Tested that the require only filters out tests in qemu+riscv64. >> >> Thanks! >> >> /Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into qemu-user-issues > - Revert > - Merge branch 'master' into qemu-user-issues > - Merge branch 'master' into qemu-user-issues > - more > - more > - native or very long > qemu-user, "uarch: qemu" in cpuinfo: `[0.084s][info ][os,cpu] CPU: total 28 (initial active 28) qemu rv64 rvi rvm rva rvf rvd rvc rvv zba zbb zbs zfh zfhmin zvbc zvfh zicond` Hence we know this is qemu-user (only qemu-user sets uarch to qemu on riscv). > > `/proc/cpuinfo` do not contain uarch: [0.053s][info ][os,cpu] CPU: total 8 (initial active 8) rv64 rvi rvm rva rvf rvd rvc zba zbb zbs zfh zfhmin zvfh zicond We have no clue if this is emulated or on real hardware, tests will be executed. > > Tests are only excluded if we know it's qemu-user. > qemu-user, "uarch: qemu" in cpuinfo: `[0.084s][info ][os,cpu] CPU: total 28 (initial active 28) qemu rv64 rvi rvm rva rvf rvd rvc rvv zba zbb zbs zfh zfhmin zvbc zvfh zicond` Hence we know this is qemu-user (only qemu-user sets uarch to qemu on riscv). > > `/proc/cpuinfo` do not contain uarch: [0.053s][info ][os,cpu] CPU: total 8 (initial active 8) rv64 rvi rvm rva rvf rvd rvc zba zbb zbs zfh zfhmin zvfh zicond We have no clue if this is emulated or on real hardware, tests will be executed. > > Tests are only excluded if we know it's qemu-user. Sorry for not being clear enough. Yes, that's how it works with qemu-user for riscv. Just wondering if it makes sense to extend this to other CPU platforms. There are two cases. - Case 1: The tests are excluded as expected if we parses "qemu" in cpuinfo with qemu-user for another CPU, which is simiar with qemu-user for riscv. But I am not sure if there is one for now. - Case 2: The tests are NOT excluded as there's no "qemu" in cpuinfo with qemu-user for another CPU. Then we still got test failures as before. But we are not causing any more regressions. I may consider that as a qemu-user issue for this CPU. And it could be fixed on the qemu-user side if it really helps people. Maybe I am demanding too much about qemu-user. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24229#issuecomment-2791376594 From sspitsyn at openjdk.org Thu Apr 10 05:50:23 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 10 Apr 2025 05:50:23 GMT Subject: RFR: 8352773: JVMTI should disable events during java upcalls [v3] In-Reply-To: References: Message-ID: > As noted in [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088), JVMTI `GetThreadGroupChildren` does an upcall to java. This results in a`ClassPrepare` event the first time it does this, and these events can cause problems (deadlocks) for the debugger or debug agent. The [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088) was fixed to get rid of class loading during Java upcall from `GetThreadGroupChildren`. However, some other events can be generated as well. It is more safe to disable all JVMTI events during debugger-related upcalls originated by JVMTI. > The `ClassPrepare` events are important for the debug agent. So, an assert was added into `ClassPrepare` event generation to make sure there are no attempts to post this event during upcalls. > Some specific implementation details can be added to the first PR comment. > > Testing: > - Verified with the test `jdk/com/sun/jdi/EarlyThreadGroupChildrenTest.java` that was added with the fix of [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088): > - the assert described above is fired if the fix of JDK-8352088 is removed > - the test is passed without if the fix of JDK-8352088 is removed and the assert is removed > - Ran mach5 tiers 1-6 Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: minor tweak in two similar comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24539/files - new: https://git.openjdk.org/jdk/pull/24539/files/342e8a78..f0b70372 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24539&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24539&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24539/head:pull/24539 PR: https://git.openjdk.org/jdk/pull/24539 From sspitsyn at openjdk.org Thu Apr 10 05:50:23 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 10 Apr 2025 05:50:23 GMT Subject: RFR: 8352773: JVMTI should disable events during java upcalls [v3] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 23:27:00 GMT, Chris Plummer wrote: >> Okay, thanks! Updated now. > >> Catch if this invariant is not broken > I think you mean "Catch if this invariant is broken" Okay, thanks! Fixed this comment now in both places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24539#discussion_r2036550047 From aboldtch at openjdk.org Thu Apr 10 05:53:35 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 10 Apr 2025 05:53:35 GMT Subject: RFR: 8354180: Clean up uses of ObjectMonitor caches In-Reply-To: References: Message-ID: <9qHC-GOthDzNjT9EiBb4XKsYU_aXygnZtKl7PC2z28Y=.bf81bf9d-70d5-42d2-9f27-76649fa52050@github.com> On Wed, 9 Apr 2025 12:47:02 GMT, Coleen Phillimore wrote: > This is mostly changes from @xmas92 as explained to me plus small cleanup to a read_caches function. > Tested with tier1-4. lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24545#pullrequestreview-2755410213 From aboldtch at openjdk.org Thu Apr 10 06:18:38 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 10 Apr 2025 06:18:38 GMT Subject: RFR: 8350441: ZGC: Overhaul Page Allocation In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 13:37:16 GMT, Joel Sikstr?m wrote: >> Note that any reference to pages from here on out refers to the concept of a heap region in ZGC, not pages in the operating system (OS), unless stated otherwise. > > # Background > > This PR addresses fragmentation by introducing a Mapped Cache that replaces the Page Cache in ZGC. The largest limitation of the Page Cache is that it is constrained by the abstraction of what a page is. The proposed Mapped Cache removes this limitation by decoupling memory from pages, allowing it to merge and split memory in ways that the Page Cache is not suited for. To facilitate the transition, much of the Page Allocator has been redesigned to work with the Mapped Cache. > > In addition to fighting fragmentation, the new approach improves NUMA-support and simplifies memory unampping. Combined, these changes lay the foundation for even more improvements in ZGC, like replacing multi-mapped memory with anonymous memory. > > # Mapped Cache > > The main benefit of the Mapped Cache is that adjacent virtual memory ranges in the cache can be merged to create larger ranges, enabling larger allocation requests to succeed more easily. Most notably, it allows allocations to succeed more often without "harvesting" smaller, discontiguous ranges. Harvesting negatively impacts both fragmentation and latency, as it requires remapping memory into a new contiguous virtual address range. Fragmentation becomes especially problematic in long-running programs and in environments with limited address space, where finding large contiguous regions can be difficult and may lead to premature Out Of Memory Errors (OOME). > > The Mapped Cache uses a self-balancing binary search tree to store memory ranges. Since the ranges are unused when inside the cache, the tree can use this memory to store metadata about itself, referred to as intrusive storage. This approach eliminates the need for dynamic memory allocation (e.g., malloc), which could otherwise introduce a latency overhead. > > # Fragmentation > > Currently, ZGC has multiple strategies for dealing with fragmentation. In some edge cases, these strategies are not as efficient as we would like. By addressing fragmentation differently with the Mapped Cache, ZGC is in a better position to avoid edge cases, which are bad even if they occur only once. This is especially impactful for programs running with a large heap. > > ## Virtual Memory Shuffling > > In addition to the Mapped Cache, we have made some adjustments in how ZGC deals with virtual memory. When harvesting memory, whic... src/hotspot/share/gc/z/zVirtualMemoryManager.hpp line 89: > 87: ZVirtualMemoryManager(size_t max_capacity); > 88: > 89: void initialize_partitions(ZVirtualMemoryReserver* reserver, size_t reserved); Suggestion: void initialize_partitions(ZVirtualMemoryReserver* reserver, size_t size_for_partitions); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24547#discussion_r2036582513 From sspitsyn at openjdk.org Thu Apr 10 06:41:42 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 10 Apr 2025 06:41:42 GMT Subject: RFR: 8316682: serviceability/jvmti/vthread/SelfSuspendDisablerTest timed out [v4] In-Reply-To: References: Message-ID: > This fixes the issue with lack of synchronization between JVMTI thread suspend and resume functions in a self-suspend case. More detailed fix description is in the first PR comment. > > Testing: Ran mach5 tiers 1-6. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: added general comment about sync between suspend_thread and resume_thread ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24269/files - new: https://git.openjdk.org/jdk/pull/24269/files/4a92986a..df99ba15 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24269&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24269&range=02-03 Stats: 19 lines in 3 files changed: 16 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24269.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24269/head:pull/24269 PR: https://git.openjdk.org/jdk/pull/24269 From sspitsyn at openjdk.org Thu Apr 10 06:41:42 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 10 Apr 2025 06:41:42 GMT Subject: RFR: 8316682: serviceability/jvmti/vthread/SelfSuspendDisablerTest timed out [v2] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 06:21:12 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1759: >> >>> 1757: Handle thread_h(current, thread_oop); >>> 1758: bool is_virtual = java_lang_VirtualThread::is_instance(thread_h()); >>> 1759: bool is_thread_carrying = is_thread_carrying_vthread(java_thread, thread_h()); >> >> I think that somewhere in this place should be an explanation of suspend<->resume synchronization. As I understand the hadshake can't be executed and clear suspend state while suspend_thread is done for the same thread. How it is guaranteed that suspend_thread flag cann't be updated? >> It is not obvious and also put some restrictions on the suspend_thread implementation to keep this behaviour. > > Thank you for reviewing and this suggestion. > Yes, you are right. I'll try to find a good place to add such a comment. I've added a comment you requested. Please, let me know if it is enough or there are some comments/suggestions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24269#discussion_r2036612988 From dholmes at openjdk.org Thu Apr 10 06:46:25 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 10 Apr 2025 06:46:25 GMT Subject: RFR: 8324686: Remove redefinition of NULL for MSVC In-Reply-To: References: Message-ID: <4V81wf1js9XTTsvnbh-o1QwWPOogweRRGsQqe644dMM=.b74dd3d5-787e-4896-94fd-2d25edfc8762@github.com> On Wed, 9 Apr 2025 06:16:18 GMT, Kim Barrett wrote: > Please review this change that removes the redefinition of NULL in > globalDefinitions_visCPP.hpp. That redefinition was to support the use of NULL > in a varargs context, because of the size difference for int vs a pointer. > However, we no longer have any direct uses of NULL in HotSpot, and have a test > that ensures there is no backsliding. > > There may be indirect uses of NULL via third-party libraries. Such uses could > have been in the scope of the removed redefinition. But those uses must have > been correct even without the redefinition, else they would be incorrect for > non-HotSpot users. > > Testing: mach5 tier1-3, GHA sanity tests Looks fine. Thanks for cleaning this up. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24537#pullrequestreview-2755514203 From rehn at openjdk.org Thu Apr 10 07:08:39 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 10 Apr 2025 07:08:39 GMT Subject: RFR: 8352730: RISC-V: Disable tests in qemu-user [v3] In-Reply-To: References: <5sujqD7L_cmLUyDwYb4PhgOlEeiFwlkAV7RJoVMFTrM=.223437cd-bbb2-4ef3-a6fe-b13ce402e14b@github.com> Message-ID: On Thu, 10 Apr 2025 02:13:46 GMT, Fei Yang wrote: > > qemu-user, "uarch: qemu" in cpuinfo: `[0.084s][info ][os,cpu] CPU: total 28 (initial active 28) qemu rv64 rvi rvm rva rvf rvd rvc rvv zba zbb zbs zfh zfhmin zvbc zvfh zicond` Hence we know this is qemu-user (only qemu-user sets uarch to qemu on riscv). > > `/proc/cpuinfo` do not contain uarch: [0.053s][info ][os,cpu] CPU: total 8 (initial active 8) rv64 rvi rvm rva rvf rvd rvc zba zbb zbs zfh zfhmin zvfh zicond We have no clue if this is emulated or on real hardware, tests will be executed. > > Tests are only excluded if we know it's qemu-user. > > > qemu-user, "uarch: qemu" in cpuinfo: `[0.084s][info ][os,cpu] CPU: total 28 (initial active 28) qemu rv64 rvi rvm rva rvf rvd rvc rvv zba zbb zbs zfh zfhmin zvbc zvfh zicond` Hence we know this is qemu-user (only qemu-user sets uarch to qemu on riscv). > > `/proc/cpuinfo` do not contain uarch: [0.053s][info ][os,cpu] CPU: total 8 (initial active 8) rv64 rvi rvm rva rvf rvd rvc zba zbb zbs zfh zfhmin zvfh zicond We have no clue if this is emulated or on real hardware, tests will be executed. > > Tests are only excluded if we know it's qemu-user. > > Sorry for not being clear enough. Yes, that's how it works with qemu-user for riscv. Just wondering if it makes sense to extend this to other CPU platforms. There are two cases. > > * Case 1: The tests are excluded as expected if we parses "qemu" in cpuinfo with qemu-user for another CPU, which is simiar with qemu-user for riscv. But I am not sure if there is one for now. > * Case 2: The tests are NOT excluded as there's no "qemu" in cpuinfo with qemu-user for another CPU. Then we still got test failures as before. But we are not causing any more regressions. I may consider that as a qemu-user issue for this CPU. And it could be fixed on the qemu-user side if it really helps people. > > Maybe I am demanding too much about qemu-user. What do you think? There is additional step: The linux cpu vm_version also need to parse the /proc/cpuinfo and add that to the JVM cpu string. Right now only rv64 and aarch64 opens that file AFIACT. And in qemu-user the only JVM supported platforms adding qemu to cpuinfo is s390 and rv64. I'll ask qemu folks and get a feel for if I can upstream some changes addressing this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24229#issuecomment-2791764329 From rvansa at openjdk.org Thu Apr 10 07:11:46 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Thu, 10 Apr 2025 07:11:46 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization [v3] In-Reply-To: References: Message-ID: <5HGPEFniG9VcKrKsrCCr2iU1_2d8VRgYSphgc6pYyiQ=.179eab80-08eb-486b-b265-db72837a9a2d@github.com> > On the reproducer https://bugs.openjdk.org/secure/attachment/113985/CCC.java my local testing shows these numbers: > > ### JDK-17 > > $ hyperfine -w 5 -r 10 '/path/to/jdk-17/bin/java -cp /tmp/ CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp/ CCC > Time (mean ? ?): 32.5 ms ? 0.9 ms [User: 27.5 ms, System: 10.6 ms] > Range (min ? max): 31.1 ms ? 33.7 ms 10 runs > > ### JDK-25 before the change applied > > $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' > Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC > Time (mean ? ?): 101.6 ms ? 1.5 ms [User: 96.8 ms, System: 14.6 ms] > Range (min ? max): 99.0 ms ? 104.5 ms 10 runs > > ### JDK-25 with this patch > > $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' > Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC > Time (mean ? ?): 75.8 ms ? 1.2 ms [User: 69.8 ms, System: 16.0 ms] > Range (min ? max): 73.8 ms ? 78.2 ms 10 runs Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Cleanup after initial review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24290/files - new: https://git.openjdk.org/jdk/pull/24290/files/11c2cb69..3d7f27d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24290&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24290&range=01-02 Stats: 19 lines in 5 files changed: 1 ins; 5 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/24290.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24290/head:pull/24290 PR: https://git.openjdk.org/jdk/pull/24290 From shade at openjdk.org Thu Apr 10 07:12:31 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 10 Apr 2025 07:12:31 GMT Subject: RFR: 8353572: x86: AMD platforms miss the check for CLWB feature flag [v3] In-Reply-To: References: <57x_bKsziIA24C8HGBSYLTa_biu2VUiG_Z2OZb_AIiU=.a43a9b2a-8657-4c24-94bc-89ca5f02c75f@github.com> Message-ID: On Wed, 9 Apr 2025 18:34:30 GMT, Aleksey Shipilev wrote: >> Noticed this when doing [JDK-8353558](https://bugs.openjdk.org/browse/JDK-8353558). We only check for CLWB feature flag for Intel platforms. But AMD APM (Rev. 3.36?March 2024) tells me there is a CLWB flag in CPUID Fn0000_0007_EBX_x0 leaf as well. It is in the same place as the flag for Intel. >> >> Additional testing: >> - [x] Ad-hoc tests on Ryzen 5950X > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add assert > - Merge branch 'master' into JDK-8353572-amd-clwb > - More feature flag commonning > - Merge branch 'master' into JDK-8353572-amd-clwb > - Fix Thanks! Looking for another Review before I can integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24385#issuecomment-2791772639 From shade at openjdk.org Thu Apr 10 07:13:31 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 10 Apr 2025 07:13:31 GMT Subject: RFR: 8351157: Clean up x86 GC barriers after 32-bit x86 removal [v3] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:31:44 GMT, Aleksey Shipilev wrote: >> Assembler GC barriers have quite a bit of coding to support 32-bit x86. As 32-bit x86 is removed, we can clean up those parts. >> >> We can eliminate `!LP64` blocks quite easily. We can also prune passing around `thread` argument, and just trust that `r15_thread` is always available. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8351157-x86-gc-barriers > - Merge branch 'master' into JDK-8351157-x86-gc-barriers > - Also do tlab_allocate > - Rely on R15 to be a thread register > - Work @tschatzl -- are you fine with G1 cleanups? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24253#issuecomment-2791775549 From dholmes at openjdk.org Thu Apr 10 07:16:31 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 10 Apr 2025 07:16:31 GMT Subject: RFR: 8351491: Add info from release file to hserr file [v8] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 07:02:12 GMT, Matthias Baesken wrote: >> The release file of the JDK image contains useful info, for example the SOURCE used to built this image e.g. >> SOURCE=".:git:21af8c7e7405" >> Also the MODULES list is probably useful to have. >> Add this info (or the complete content of the release file) to the hs_err files. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > print some output in case release file has not been read Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24244#pullrequestreview-2755589418 From tschatzl at openjdk.org Thu Apr 10 07:21:29 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 07:21:29 GMT Subject: RFR: 8351157: Clean up x86 GC barriers after 32-bit x86 removal [v3] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:31:44 GMT, Aleksey Shipilev wrote: >> Assembler GC barriers have quite a bit of coding to support 32-bit x86. As 32-bit x86 is removed, we can clean up those parts. >> >> We can eliminate `!LP64` blocks quite easily. We can also prune passing around `thread` argument, and just trust that `r15_thread` is always available. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8351157-x86-gc-barriers > - Merge branch 'master' into JDK-8351157-x86-gc-barriers > - Also do tlab_allocate > - Rely on R15 to be a thread register > - Work Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24253#pullrequestreview-2755601117 From tschatzl at openjdk.org Thu Apr 10 07:26:28 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 07:26:28 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v31] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits: - * fixes after merge related to 32 bit x86 removal - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * ayang review: revising young gen length * robcasloz review: various minor refactorings - Do not unnecessarily pass around tmp2 in x86 - Refine needs_liveness_data - Reorder includes - * missing file from merge - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - ... and 35 more: https://git.openjdk.org/jdk/compare/45b7c748...39aa903f ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=30 Stats: 7118 lines in 110 files changed: 2586 ins; 3598 del; 934 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Thu Apr 10 07:28:31 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 07:28:31 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 22:24:10 GMT, Martin Doerr wrote: > This PR needs an update for x86 platforms when merging: g1BarrierSetAssembler_x86.cpp:117:6: error: 'class MacroAssembler' has no member named 'get_thread' I fixed this for now, but it will be broken again in just a bit with Aleksey's ongoing removal of x86 32 bit platform efforts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2791807489 From rrich at openjdk.org Thu Apr 10 07:29:26 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 10 Apr 2025 07:29:26 GMT Subject: RFR: 8351666: [PPC64] Make non-volatile VectorRegisters available for C2 register allocation [v4] In-Reply-To: References: <66CxpgDkdMjhIrTcz59yakF1YhB4CE-Uw711KWbUM40=.756e61e1-6e6a-4a1f-b660-dc5e0f18d052@github.com> Message-ID: On Wed, 9 Apr 2025 22:02:11 GMT, Martin Doerr wrote: >> Also: why is it even necessary to define VSR0 - VSR31 if we don't use them (because they are aliases of the FP regs)? I assume they unnecessaryly enlarge RegMasks and the size of RegMasks is critical for memory consumtion. > > I've removed them. This makes sense. Thanks for your feedback! I'll take a closer look at the other comments when I find more time. > BTW: You might use vector pair load/stores in `MachSpillCopyNode::implementation()` too. Probably not load/stores for _pairs_ but better instructions like lxv and stxv. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23987#discussion_r2036687779 From dholmes at openjdk.org Thu Apr 10 07:37:26 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 10 Apr 2025 07:37:26 GMT Subject: RFR: 8352773: JVMTI should disable events during java upcalls [v3] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 05:50:23 GMT, Serguei Spitsyn wrote: >> As noted in [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088), JVMTI `GetThreadGroupChildren` does an upcall to java. This results in a`ClassPrepare` event the first time it does this, and these events can cause problems (deadlocks) for the debugger or debug agent. The [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088) was fixed to get rid of class loading during Java upcall from `GetThreadGroupChildren`. However, some other events can be generated as well. It is more safe to disable all JVMTI events during debugger-related upcalls originated by JVMTI. >> The `ClassPrepare` events are important for the debug agent. So, an assert was added into `ClassPrepare` event generation to make sure there are no attempts to post this event during upcalls. >> Some specific implementation details can be added to the first PR comment. >> >> Testing: >> - Verified with the test `jdk/com/sun/jdi/EarlyThreadGroupChildrenTest.java` that was added with the fix of [JDK-8352088](https://bugs.openjdk.org/browse/JDK-8352088): >> - the assert described above is fired if the fix of JDK-8352088 is removed >> - the test is passed without if the fix of JDK-8352088 is removed and the assert is removed >> - Ran mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: minor tweak in two similar comments This looks reasonable to me. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24539#pullrequestreview-2755643289 From shade at openjdk.org Thu Apr 10 07:47:41 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 10 Apr 2025 07:47:41 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization [v3] In-Reply-To: <5HGPEFniG9VcKrKsrCCr2iU1_2d8VRgYSphgc6pYyiQ=.179eab80-08eb-486b-b265-db72837a9a2d@github.com> References: <5HGPEFniG9VcKrKsrCCr2iU1_2d8VRgYSphgc6pYyiQ=.179eab80-08eb-486b-b265-db72837a9a2d@github.com> Message-ID: On Thu, 10 Apr 2025 07:11:46 GMT, Radim Vansa wrote: >> On the reproducer https://bugs.openjdk.org/secure/attachment/113985/CCC.java my local testing shows these numbers: >> >> ### JDK-17 >> >> $ hyperfine -w 5 -r 10 '/path/to/jdk-17/bin/java -cp /tmp/ CCC' >> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp/ CCC >> Time (mean ? ?): 32.5 ms ? 0.9 ms [User: 27.5 ms, System: 10.6 ms] >> Range (min ? max): 31.1 ms ? 33.7 ms 10 runs >> >> ### JDK-25 before the change applied >> >> $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' >> Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC >> Time (mean ? ?): 101.6 ms ? 1.5 ms [User: 96.8 ms, System: 14.6 ms] >> Range (min ? max): 99.0 ms ? 104.5 ms 10 runs >> >> ### JDK-25 with this patch >> >> $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' >> Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC >> Time (mean ? ?): 75.8 ms ? 1.2 ms [User: 69.8 ms, System: 16.0 ms] >> Range (min ? max): 73.8 ms ? 78.2 ms 10 runs > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup after initial review Looks reasonable to me, just the stylistic nits. @fparain should take a look as well. src/hotspot/share/oops/fieldStreams.hpp line 134: > 132: fieldDescriptor& field_descriptor() const { > 133: fieldDescriptor& field = const_cast(_fd_buf); > 134: field.reinitialize(field_holder(), _fi_buf); `_fi_buf` - > `to_FieldInfo()`? Reads better, and I expect it to fully inline without any performance loss. src/hotspot/share/runtime/fieldDescriptor.hpp line 105: > 103: > 104: // Initialization > 105: void reinitialize(InstanceKlass* ik, const FieldInfo &fieldinfo); I think the style is `const FieldInfo& fieldinfo`. Also in definition. src/hotspot/share/utilities/tuple.hpp line 43: > 41: > 42: public: > 43: constexpr Tuple() noexcept {} Do you still need this? ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24290#pullrequestreview-2755649254 PR Review Comment: https://git.openjdk.org/jdk/pull/24290#discussion_r2036713643 PR Review Comment: https://git.openjdk.org/jdk/pull/24290#discussion_r2036718573 PR Review Comment: https://git.openjdk.org/jdk/pull/24290#discussion_r2036703719 From shade at openjdk.org Thu Apr 10 07:47:43 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 10 Apr 2025 07:47:43 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 07:50:51 GMT, Johan Sj?len wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix compilation error in assertion > > src/hotspot/share/oops/instanceKlass.cpp line 1940: > >> 1938: // In DebugInfo nonstatic fields are sorted by offset. >> 1939: GrowableArray > fields_sorted; >> 1940: int i = 0; > > Would you mind also cleaning up this usage of `i`? Seems like it can be removed and `fields_sorted.length()` can be used instead. +1. Just purge this `i` and use `fields_sorted.length()` straight up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24290#discussion_r2036709752 From rvansa at openjdk.org Thu Apr 10 07:55:56 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Thu, 10 Apr 2025 07:55:56 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization [v4] In-Reply-To: References: Message-ID: > On the reproducer https://bugs.openjdk.org/secure/attachment/113985/CCC.java my local testing shows these numbers: > > ### JDK-17 > > $ hyperfine -w 5 -r 10 '/path/to/jdk-17/bin/java -cp /tmp/ CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp/ CCC > Time (mean ? ?): 32.5 ms ? 0.9 ms [User: 27.5 ms, System: 10.6 ms] > Range (min ? max): 31.1 ms ? 33.7 ms 10 runs > > ### JDK-25 before the change applied > > $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' > Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC > Time (mean ? ?): 101.6 ms ? 1.5 ms [User: 96.8 ms, System: 14.6 ms] > Range (min ? max): 99.0 ms ? 104.5 ms 10 runs > > ### JDK-25 with this patch > > $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' > Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC > Time (mean ? ?): 75.8 ms ? 1.2 ms [User: 69.8 ms, System: 16.0 ms] > Range (min ? max): 73.8 ms ? 78.2 ms 10 runs Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Cleanup of iteration ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24290/files - new: https://git.openjdk.org/jdk/pull/24290/files/3d7f27d5..a7754467 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24290&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24290&range=02-03 Stats: 10 lines in 2 files changed: 2 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24290.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24290/head:pull/24290 PR: https://git.openjdk.org/jdk/pull/24290 From shade at openjdk.org Thu Apr 10 07:56:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 10 Apr 2025 07:56:45 GMT Subject: RFR: 8351157: Clean up x86 GC barriers after 32-bit x86 removal [v3] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:31:44 GMT, Aleksey Shipilev wrote: >> Assembler GC barriers have quite a bit of coding to support 32-bit x86. As 32-bit x86 is removed, we can clean up those parts. >> >> We can eliminate `!LP64` blocks quite easily. We can also prune passing around `thread` argument, and just trust that `r15_thread` is always available. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8351157-x86-gc-barriers > - Merge branch 'master' into JDK-8351157-x86-gc-barriers > - Also do tlab_allocate > - Rely on R15 to be a thread register > - Work Thanks! I merged with current mainline locally, and there are no surprises. So I am integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24253#issuecomment-2791873674 From shade at openjdk.org Thu Apr 10 07:56:45 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 10 Apr 2025 07:56:45 GMT Subject: Integrated: 8351157: Clean up x86 GC barriers after 32-bit x86 removal In-Reply-To: References: Message-ID: <7cljRYtWnAH4Yn3zAUM2uHyPV3xteo3vEvZJhNHzKsk=.2533276f-0e18-4176-bc90-3ceaefa0dcdb@github.com> On Wed, 26 Mar 2025 12:48:13 GMT, Aleksey Shipilev wrote: > Assembler GC barriers have quite a bit of coding to support 32-bit x86. As 32-bit x86 is removed, we can clean up those parts. > > We can eliminate `!LP64` blocks quite easily. We can also prune passing around `thread` argument, and just trust that `r15_thread` is always available. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: 73c8c755 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/73c8c755ea638c09147d28080646ee8887ee8283 Stats: 543 lines in 20 files changed: 1 ins; 426 del; 116 mod 8351157: Clean up x86 GC barriers after 32-bit x86 removal Reviewed-by: kbarrett, wkemper, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/24253 From rvansa at openjdk.org Thu Apr 10 08:07:58 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Thu, 10 Apr 2025 08:07:58 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization [v5] In-Reply-To: References: Message-ID: > On the reproducer https://bugs.openjdk.org/secure/attachment/113985/CCC.java my local testing shows these numbers: > > ### JDK-17 > > $ hyperfine -w 5 -r 10 '/path/to/jdk-17/bin/java -cp /tmp/ CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp/ CCC > Time (mean ? ?): 32.5 ms ? 0.9 ms [User: 27.5 ms, System: 10.6 ms] > Range (min ? max): 31.1 ms ? 33.7 ms 10 runs > > ### JDK-25 before the change applied > > $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' > Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC > Time (mean ? ?): 101.6 ms ? 1.5 ms [User: 96.8 ms, System: 14.6 ms] > Range (min ? max): 99.0 ms ? 104.5 ms 10 runs > > ### JDK-25 with this patch > > $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' > Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC > Time (mean ? ?): 75.8 ms ? 1.2 ms [User: 69.8 ms, System: 16.0 ms] > Range (min ? max): 73.8 ms ? 78.2 ms 10 runs Radim Vansa has updated the pull request incrementally with three additional commits since the last revision: - Minor improvements - Style update - Revert changes in Tuple ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24290/files - new: https://git.openjdk.org/jdk/pull/24290/files/a7754467..d278e77e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24290&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24290&range=03-04 Stats: 8 lines in 6 files changed: 0 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24290.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24290/head:pull/24290 PR: https://git.openjdk.org/jdk/pull/24290 From shade at openjdk.org Thu Apr 10 08:12:36 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 10 Apr 2025 08:12:36 GMT Subject: RFR: 8352565: Add native method implementation of Reference.get() [v4] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 18:33:16 GMT, Kim Barrett wrote: >> Please review this change which adds a native method providing the >> implementation of Reference::get. Referece::get is an intrinsic candidate, so >> this native method implementation is only used when the intrinsic is not. >> >> Currently there is intrinsic support by the interpreter, C1, C2, and graal, >> which are always used. With this change we can later remove all the >> per-platform interpreter intrinsic implementations, and might also remove the >> C1 intrinsic implementation. >> >> Testing: >> (1) mach5 tier1-6 normal (so using all the existing intrinsics). >> (2) mach5 tier1-6 with interpreter and C1 Reference::get intrinsics disabled. > > Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: > > - remove timeout by using waitForReferenceProcessing > - make ill-timed gc in non-concurrent case less likely > - fix test package use src/java.base/share/classes/java/lang/ref/Reference.java line 357: > 355: @IntrinsicCandidate > 356: public T get() { > 357: return get0(); I am looking at this now and wondering how current intrinsics matchers work in case of virtual calls. For example, when type information/profile tells us the receiver is generic `Reference`, but in reality it is a `PhantomReference` subclass, would the call to `PhantomReference.get()` match accidentally to `Reference.get` intrinsic, and thus enter Access API wit `ON_WEAK_REF`? Looks pre-existing, and I would have expected native code to assert, but I also think at least C2 intrinsics do not check the reference type. It seems both `clear` and `refersTo` side-step all this by: a) not intrinsifying the virtual methods; b) doing `AS_NO_KEEPALIVE` -- so they are not as exposed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24315#discussion_r2036768100 From rvansa at openjdk.org Thu Apr 10 08:16:25 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Thu, 10 Apr 2025 08:16:25 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization [v6] In-Reply-To: References: Message-ID: > On the reproducer https://bugs.openjdk.org/secure/attachment/113985/CCC.java my local testing shows these numbers: > > ### JDK-17 > > $ hyperfine -w 5 -r 10 '/path/to/jdk-17/bin/java -cp /tmp/ CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp/ CCC > Time (mean ? ?): 32.5 ms ? 0.9 ms [User: 27.5 ms, System: 10.6 ms] > Range (min ? max): 31.1 ms ? 33.7 ms 10 runs > > ### JDK-25 before the change applied > > $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' > Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC > Time (mean ? ?): 101.6 ms ? 1.5 ms [User: 96.8 ms, System: 14.6 ms] > Range (min ? max): 99.0 ms ? 104.5 ms 10 runs > > ### JDK-25 with this patch > > $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' > Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC > Time (mean ? ?): 75.8 ms ? 1.2 ms [User: 69.8 ms, System: 16.0 ms] > Range (min ? max): 73.8 ms ? 78.2 ms 10 runs Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Remove outdated comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24290/files - new: https://git.openjdk.org/jdk/pull/24290/files/d278e77e..5fb22d42 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24290&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24290&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24290.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24290/head:pull/24290 PR: https://git.openjdk.org/jdk/pull/24290 From rvansa at openjdk.org Thu Apr 10 08:16:26 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Thu, 10 Apr 2025 08:16:26 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization [v2] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 08:08:43 GMT, Johan Sj?len wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix compilation error in assertion > > Hi, > > The idea behind the change is good, but I think that we can clean up the code. See the comments for how that can be done. Thank you @jdksjolen and @shipilev , I think I've addressed all the comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24290#issuecomment-2791930800 From shade at openjdk.org Thu Apr 10 08:26:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 10 Apr 2025 08:26:51 GMT Subject: RFR: 8351152: x86: Remove code blocks that handle UseSSE < 2 [v3] In-Reply-To: References: Message-ID: > 32-bit x86 was the platform that supported `UseSSE < 2`. 64-bit x86 baselines on `UseSSE >= 2`: https://github.com/openjdk/jdk/blob/567c6885a377e5641deef9cd3498f79c5346cd6a/src/hotspot/cpu/x86/vm_version_x86.cpp#L895-L902 > > After 32-bit x86 code is gone, we can remove all code blocks that are there to support `UseSSE < 2`. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into JDK-8351152-x86-sse2-everywhere - Also purge vestigial calls to VMVersion::supports_sse{2} - Also 24-bit removals - Touchups - Fix ------------- Changes: https://git.openjdk.org/jdk/pull/24484/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24484&range=02 Stats: 822 lines in 17 files changed: 34 ins; 542 del; 246 mod Patch: https://git.openjdk.org/jdk/pull/24484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24484/head:pull/24484 PR: https://git.openjdk.org/jdk/pull/24484 From shade at openjdk.org Thu Apr 10 08:36:33 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 10 Apr 2025 08:36:33 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: <03K6ui5yP3iy8HS_C4nurnsrbOymrm_962YA0-U92IM=.0f83b0ac-5895-4e1a-bb22-0006bd5dd888@github.com> On Thu, 10 Apr 2025 07:25:47 GMT, Thomas Schatzl wrote: > I fixed this for now, but it will be broken again in just a bit with Aleksey's ongoing removal of x86 32 bit platform efforts. I think all x86 cleanups related to GC and adjacent code have landed in mainline now. So I expect no more major conflicts with this PR :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2791985351 From shade at openjdk.org Thu Apr 10 08:42:36 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 10 Apr 2025 08:42:36 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization [v6] In-Reply-To: References: Message-ID: <8NMGLWo0iN67IorkWm4_LIeuwhHtBNoW_VlKyt-HfL0=.fe484f48-7be0-4302-9637-14dbc1c15a4c@github.com> On Thu, 10 Apr 2025 08:16:25 GMT, Radim Vansa wrote: >> On the reproducer https://bugs.openjdk.org/secure/attachment/113985/CCC.java my local testing shows these numbers: >> >> ### JDK-17 >> >> $ hyperfine -w 5 -r 10 '/path/to/jdk-17/bin/java -cp /tmp/ CCC' >> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp/ CCC >> Time (mean ? ?): 32.5 ms ? 0.9 ms [User: 27.5 ms, System: 10.6 ms] >> Range (min ? max): 31.1 ms ? 33.7 ms 10 runs >> >> ### JDK-25 before the change applied >> >> $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' >> Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC >> Time (mean ? ?): 101.6 ms ? 1.5 ms [User: 96.8 ms, System: 14.6 ms] >> Range (min ? max): 99.0 ms ? 104.5 ms 10 runs >> >> ### JDK-25 with this patch >> >> $ hyperfine -w 5 -r 10 '/path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC' >> Benchmark 1: /path/to/jdk-25/build/linux-x86_64-server-release/images/jdk/bin/java -cp /tmp/ CCC >> Time (mean ? ?): 75.8 ms ? 1.2 ms [User: 69.8 ms, System: 16.0 ms] >> Range (min ? max): 73.8 ms ? 78.2 ms 10 runs > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Remove outdated comment src/hotspot/share/oops/instanceKlass.cpp line 1945: > 1943: fields_sorted.sort(compare_fields_by_offset); > 1944: fieldDescriptor fd; > 1945: for (auto it = fields_sorted.begin(); it != fields_sorted.end(); ++it) { Ah, that's not what I meant :) There is no need to use iterators here, just pull `length` out of `fields_stored.length()`, and use the same old indexed loop: int length = fields_stored.length(); if (length > 0) { fields_sorted.sort(compare_fields_by_offset); for (int i = 0; i < length; i++) { ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24290#discussion_r2036826130 From syan at openjdk.org Thu Apr 10 08:45:35 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 10 Apr 2025 08:45:35 GMT Subject: Integrated: 8353189: [ASAN] memory leak after 8352184 In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 16:40:44 GMT, SendaoYan wrote: > Hi all, > > This PR will try to fix memory leak after JDK-8352184. which re-do [JDK-8352184](https://bugs.openjdk.org/browse/JDK-8352184) using the original, purely static uses of the various description strings, as @jianglizhou had proposed. > > Additional testing: > > - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64 > - [x] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 > - [x] full `java -version` tests, the test shell script show as below. > > [JDK-8353189.sh.txt](https://github.com/user-attachments/files/19643535/JDK-8353189.sh.txt) This pull request has now been integrated. Changeset: 6545e0d9 Author: SendaoYan URL: https://git.openjdk.org/jdk/commit/6545e0d9a39c772ead0cbdd525b624f21e252a6a Stats: 46 lines in 1 file changed: 20 ins; 16 del; 10 mod 8353189: [ASAN] memory leak after 8352184 Co-authored-by: Jiangli Zhou Co-authored-by: David Holmes Reviewed-by: dholmes, jiangli ------------- PR: https://git.openjdk.org/jdk/pull/24299 From syan at openjdk.org Thu Apr 10 08:45:35 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 10 Apr 2025 08:45:35 GMT Subject: RFR: 8353189: [ASAN] memory leak after 8352184 In-Reply-To: References: Message-ID: On Sun, 6 Apr 2025 22:01:20 GMT, David Holmes wrote: >> Hi all, >> >> This PR will try to fix memory leak after JDK-8352184. which re-do [JDK-8352184](https://bugs.openjdk.org/browse/JDK-8352184) using the original, purely static uses of the various description strings, as @jianglizhou had proposed. >> >> Additional testing: >> >> - [x] jtreg tests(include tier1/2/3 etc.) on linux-x64 >> - [x] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 >> - [x] full `java -version` tests, the test shell script show as below. >> >> [JDK-8353189.sh.txt](https://github.com/user-attachments/files/19643535/JDK-8353189.sh.txt) > > After looking into the details ([JDK-8353595](https://bugs.openjdk.org/browse/JDK-8353595)) I don't think there is any choice but to re-do [JDK-8352184](https://bugs.openjdk.org/browse/JDK-8352184) using the original, purely static uses of the various description strings, as [~jiangli] had proposed. Thanks for the reviews and suggestions @dholmes-ora @zhengyu123 @jianglizhou ------------- PR Comment: https://git.openjdk.org/jdk/pull/24299#issuecomment-2792009548 From stefank at openjdk.org Thu Apr 10 09:03:34 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 10 Apr 2025 09:03:34 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v9] In-Reply-To: <8KEGhJ5sXoeeT2ezqvyG-uYWlXUzBGSHD_RLwjAH8LI=.89670a1f-2e4a-4c88-8329-3261d462cae0@github.com> References: <8KEGhJ5sXoeeT2ezqvyG-uYWlXUzBGSHD_RLwjAH8LI=.89670a1f-2e4a-4c88-8329-3261d462cae0@github.com> Message-ID: On Wed, 9 Apr 2025 20:32:15 GMT, Gerard Ziemski wrote: > Thank you Stefan for providing the values of mem_tags and your feedback. Do you want to be a co-author on this PR? Sure, why not. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24282#issuecomment-2792059171 From tschatzl at openjdk.org Thu Apr 10 09:07:39 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 09:07:39 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v32] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 46 commits: - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * fixes after merge related to 32 bit x86 removal - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * ayang review: revising young gen length * robcasloz review: various minor refactorings - Do not unnecessarily pass around tmp2 in x86 - Refine needs_liveness_data - Reorder includes - * missing file from merge - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into 8342382-card-table-instead-of-dcq - ... and 36 more: https://git.openjdk.org/jdk/compare/f94a4f7a...fcf96a2a ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=31 Stats: 7112 lines in 110 files changed: 2592 ins; 3594 del; 926 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From ayang at openjdk.org Thu Apr 10 09:12:32 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 10 Apr 2025 09:12:32 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 14:32:43 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1ConcurrentRefineSweepTask.cpp line 83: >> >>> 81: break; >>> 82: } >>> 83: case G1RemSet::HasRefToOld : break; // Nothing special to do. >> >> Why doesn't call `inc_cards_clean_again` in this case? The card is cleared also. (In fact, I don't get why this needs to a separate case from `NoInteresting`.) > > "NoInteresting" means that the card contains no interesting reference at all. "HasRefToOld" means that there has been an interesting reference in the card. > > The distinction between these groups of cards seems interesting to me. E.g. out of X non-clean cards, there were A with a reference to the collection set, B that were already marked as containing a card to the collection, C not having any interesting card any more (transitioned from clean -> dirty -> clean, and cleared by the mutator), D being non-parsable, and E having references to old (and no other references). > > I could add a separate counter for these type of cards too - they can be inferred from the total number of scanned minus the others though. I see; "clean again" means the existing interesting pointer was overwritten by mutator. I misinterpret the comment as cards transitioned from dirty to clean. ` size_t _cards_clean_again; // Dirtied cards that were cleaned.` To prevent misunderstanding, what do you think of renaming "NoInteresting" to "NoCrossRegion" and "_cards_clean_again" to "_cards_no_cross_region", or sth alike so that the 1:1 mapping is clearer? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2036885633 From jsikstro at openjdk.org Thu Apr 10 09:20:58 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 10 Apr 2025 09:20:58 GMT Subject: RFR: 8350441: ZGC: Overhaul Page Allocation [v2] In-Reply-To: References: Message-ID: >> Note that any reference to pages from here on out refers to the concept of a heap region in ZGC, not pages in the operating system (OS), unless stated otherwise. > > # Background > > This PR addresses fragmentation by introducing a Mapped Cache that replaces the Page Cache in ZGC. The largest limitation of the Page Cache is that it is constrained by the abstraction of what a page is. The proposed Mapped Cache removes this limitation by decoupling memory from pages, allowing it to merge and split memory in ways that the Page Cache is not suited for. To facilitate the transition, much of the Page Allocator has been redesigned to work with the Mapped Cache. > > In addition to fighting fragmentation, the new approach improves NUMA-support and simplifies memory unampping. Combined, these changes lay the foundation for even more improvements in ZGC, like replacing multi-mapped memory with anonymous memory. > > # Mapped Cache > > The main benefit of the Mapped Cache is that adjacent virtual memory ranges in the cache can be merged to create larger ranges, enabling larger allocation requests to succeed more easily. Most notably, it allows allocations to succeed more often without "harvesting" smaller, discontiguous ranges. Harvesting negatively impacts both fragmentation and latency, as it requires remapping memory into a new contiguous virtual address range. Fragmentation becomes especially problematic in long-running programs and in environments with limited address space, where finding large contiguous regions can be difficult and may lead to premature Out Of Memory Errors (OOME). > > The Mapped Cache uses a self-balancing binary search tree to store memory ranges. Since the ranges are unused when inside the cache, the tree can use this memory to store metadata about itself, referred to as intrusive storage. This approach eliminates the need for dynamic memory allocation (e.g., malloc), which could otherwise introduce a latency overhead. > > # Fragmentation > > Currently, ZGC has multiple strategies for dealing with fragmentation. In some edge cases, these strategies are not as efficient as we would like. By addressing fragmentation differently with the Mapped Cache, ZGC is in a better position to avoid edge cases, which are bad even if they occur only once. This is especially impactful for programs running with a large heap. > > ## Virtual Memory Shuffling > > In addition to the Mapped Cache, we have made some adjustments in how ZGC deals with virtual memory. When harvesting memory, whic... Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/gc/z/zVirtualMemoryManager.hpp Co-authored-by: Axel Boldt-Christmas ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24547/files - new: https://git.openjdk.org/jdk/pull/24547/files/5f9caa7a..ea2b5f97 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24547&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24547&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24547.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24547/head:pull/24547 PR: https://git.openjdk.org/jdk/pull/24547 From stefank at openjdk.org Thu Apr 10 09:20:58 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 10 Apr 2025 09:20:58 GMT Subject: RFR: 8350441: ZGC: Overhaul Page Allocation [v2] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 09:16:56 GMT, Joel Sikstr?m wrote: >>> Note that any reference to pages from here on out refers to the concept of a heap region in ZGC, not pages in the operating system (OS), unless stated otherwise. >> >> # Background >> >> This PR addresses fragmentation by introducing a Mapped Cache that replaces the Page Cache in ZGC. The largest limitation of the Page Cache is that it is constrained by the abstraction of what a page is. The proposed Mapped Cache removes this limitation by decoupling memory from pages, allowing it to merge and split memory in ways that the Page Cache is not suited for. To facilitate the transition, much of the Page Allocator has been redesigned to work with the Mapped Cache. >> >> In addition to fighting fragmentation, the new approach improves NUMA-support and simplifies memory unampping. Combined, these changes lay the foundation for even more improvements in ZGC, like replacing multi-mapped memory with anonymous memory. >> >> # Mapped Cache >> >> The main benefit of the Mapped Cache is that adjacent virtual memory ranges in the cache can be merged to create larger ranges, enabling larger allocation requests to succeed more easily. Most notably, it allows allocations to succeed more often without "harvesting" smaller, discontiguous ranges. Harvesting negatively impacts both fragmentation and latency, as it requires remapping memory into a new contiguous virtual address range. Fragmentation becomes especially problematic in long-running programs and in environments with limited address space, where finding large contiguous regions can be difficult and may lead to premature Out Of Memory Errors (OOME). >> >> The Mapped Cache uses a self-balancing binary search tree to store memory ranges. Since the ranges are unused when inside the cache, the tree can use this memory to store metadata about itself, referred to as intrusive storage. This approach eliminates the need for dynamic memory allocation (e.g., malloc), which could otherwise introduce a latency overhead. >> >> # Fragmentation >> >> Currently, ZGC has multiple strategies for dealing with fragmentation. In some edge cases, these strategies are not as efficient as we would like. By addressing fragmentation differently with the Mapped Cache, ZGC is in a better position to avoid edge cases, which are bad even if they occur only once. This is especially impactful for programs running with a large heap. >> >> ## Virtual Memory Shuffling >> >> In addition to the Mapped Cache, we have made some adjustments in how ZGC deals with vir... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/gc/z/zVirtualMemoryManager.hpp > > Co-authored-by: Axel Boldt-Christmas Looks good! Marked as reviewed by stefank (Reviewer). ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24547#pullrequestreview-2755953546 PR Review: https://git.openjdk.org/jdk/pull/24547#pullrequestreview-2755957917 From aboldtch at openjdk.org Thu Apr 10 09:20:58 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 10 Apr 2025 09:20:58 GMT Subject: RFR: 8350441: ZGC: Overhaul Page Allocation [v2] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 09:16:56 GMT, Joel Sikstr?m wrote: >>> Note that any reference to pages from here on out refers to the concept of a heap region in ZGC, not pages in the operating system (OS), unless stated otherwise. >> >> # Background >> >> This PR addresses fragmentation by introducing a Mapped Cache that replaces the Page Cache in ZGC. The largest limitation of the Page Cache is that it is constrained by the abstraction of what a page is. The proposed Mapped Cache removes this limitation by decoupling memory from pages, allowing it to merge and split memory in ways that the Page Cache is not suited for. To facilitate the transition, much of the Page Allocator has been redesigned to work with the Mapped Cache. >> >> In addition to fighting fragmentation, the new approach improves NUMA-support and simplifies memory unampping. Combined, these changes lay the foundation for even more improvements in ZGC, like replacing multi-mapped memory with anonymous memory. >> >> # Mapped Cache >> >> The main benefit of the Mapped Cache is that adjacent virtual memory ranges in the cache can be merged to create larger ranges, enabling larger allocation requests to succeed more easily. Most notably, it allows allocations to succeed more often without "harvesting" smaller, discontiguous ranges. Harvesting negatively impacts both fragmentation and latency, as it requires remapping memory into a new contiguous virtual address range. Fragmentation becomes especially problematic in long-running programs and in environments with limited address space, where finding large contiguous regions can be difficult and may lead to premature Out Of Memory Errors (OOME). >> >> The Mapped Cache uses a self-balancing binary search tree to store memory ranges. Since the ranges are unused when inside the cache, the tree can use this memory to store metadata about itself, referred to as intrusive storage. This approach eliminates the need for dynamic memory allocation (e.g., malloc), which could otherwise introduce a latency overhead. >> >> # Fragmentation >> >> Currently, ZGC has multiple strategies for dealing with fragmentation. In some edge cases, these strategies are not as efficient as we would like. By addressing fragmentation differently with the Mapped Cache, ZGC is in a better position to avoid edge cases, which are bad even if they occur only once. This is especially impactful for programs running with a large heap. >> >> ## Virtual Memory Shuffling >> >> In addition to the Mapped Cache, we have made some adjustments in how ZGC deals with vir... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/gc/z/zVirtualMemoryManager.hpp > > Co-authored-by: Axel Boldt-Christmas lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24547#pullrequestreview-2755960462 From eosterlund at openjdk.org Thu Apr 10 09:31:37 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 10 Apr 2025 09:31:37 GMT Subject: RFR: 8350441: ZGC: Overhaul Page Allocation [v2] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 09:20:58 GMT, Joel Sikstr?m wrote: >>> Note that any reference to pages from here on out refers to the concept of a heap region in ZGC, not pages in the operating system (OS), unless stated otherwise. >> >> # Background >> >> This PR addresses fragmentation by introducing a Mapped Cache that replaces the Page Cache in ZGC. The largest limitation of the Page Cache is that it is constrained by the abstraction of what a page is. The proposed Mapped Cache removes this limitation by decoupling memory from pages, allowing it to merge and split memory in ways that the Page Cache is not suited for. To facilitate the transition, much of the Page Allocator has been redesigned to work with the Mapped Cache. >> >> In addition to fighting fragmentation, the new approach improves NUMA-support and simplifies memory unampping. Combined, these changes lay the foundation for even more improvements in ZGC, like replacing multi-mapped memory with anonymous memory. >> >> # Mapped Cache >> >> The main benefit of the Mapped Cache is that adjacent virtual memory ranges in the cache can be merged to create larger ranges, enabling larger allocation requests to succeed more easily. Most notably, it allows allocations to succeed more often without "harvesting" smaller, discontiguous ranges. Harvesting negatively impacts both fragmentation and latency, as it requires remapping memory into a new contiguous virtual address range. Fragmentation becomes especially problematic in long-running programs and in environments with limited address space, where finding large contiguous regions can be difficult and may lead to premature Out Of Memory Errors (OOME). >> >> The Mapped Cache uses a self-balancing binary search tree to store memory ranges. Since the ranges are unused when inside the cache, the tree can use this memory to store metadata about itself, referred to as intrusive storage. This approach eliminates the need for dynamic memory allocation (e.g., malloc), which could otherwise introduce a latency overhead. >> >> # Fragmentation >> >> Currently, ZGC has multiple strategies for dealing with fragmentation. In some edge cases, these strategies are not as efficient as we would like. By addressing fragmentation differently with the Mapped Cache, ZGC is in a better position to avoid edge cases, which are bad even if they occur only once. This is especially impactful for programs running with a large heap. >> >> ## Virtual Memory Shuffling >> >> In addition to the Mapped Cache, we have made some adjustments in how ZGC deals with vir... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/gc/z/zVirtualMemoryManager.hpp > > Co-authored-by: Axel Boldt-Christmas Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24547#pullrequestreview-2756002581 From tschatzl at openjdk.org Thu Apr 10 10:02:41 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 10:02:41 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: <03K6ui5yP3iy8HS_C4nurnsrbOymrm_962YA0-U92IM=.0f83b0ac-5895-4e1a-bb22-0006bd5dd888@github.com> References: <03K6ui5yP3iy8HS_C4nurnsrbOymrm_962YA0-U92IM=.0f83b0ac-5895-4e1a-bb22-0006bd5dd888@github.com> Message-ID: On Thu, 10 Apr 2025 08:34:00 GMT, Aleksey Shipilev wrote: > > I fixed this for now, but it will be broken again in just a bit with Aleksey's ongoing removal of x86 32 bit platform efforts. > > I think all x86 cleanups related to GC and adjacent code have landed in mainline now. So I expect no more major conflicts with this PR :) Thanks. :) @TheRealMDoerr: should be fixed now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2792213039 From tschatzl at openjdk.org Thu Apr 10 10:02:40 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 10:02:40 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v33] In-Reply-To: References: Message-ID: <5FzYDFpFOksmAGM5RV0gGk2eDAdinlDCGo8_37eUeEA=.5f96c37e-7b10-41b4-a607-fc7a665abd67@github.com> > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: - * indentation fix - * remove support for 32 bit x86 in the barrier generation code, following latest changes from @shade ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/fcf96a2a..068d2a37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=31-32 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From ihse at openjdk.org Thu Apr 10 10:18:13 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 10:18:13 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding Message-ID: I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. Methodology used: I have run four different tools for using different heuristics for determining the encoding of a file: * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) * uchardet (a modern version by freedesktop, used by e.g. Firefox) * enca (targeted towards obscure code pages) * libmagic / `file --mime-encoding` They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` >From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: * All files where at least one tool claimed it to be UTF-8 * All files where at least one tool claimed it to be *not* UTF-8 For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure binary files, but without any telling extensions (most of these are in tests). The BOM files were only pointed out by chardetect; I did run an additional search for UTF-8 BOM markers over the code base to make sure I did not miss any others (since chardetect apart from this did a not-so-perfect job). The files included in this PR are what I actually found that had encoding errors or issues. ------------- Commit messages: - Remove UTF-8 BOM (byte-order mark) which is discouraged by the Unicode Consortium - Fix incorrect encoding Changes: https://git.openjdk.org/jdk/pull/24566/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24566&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354266 Stats: 32 lines in 13 files changed: 0 ins; 2 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/24566.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24566/head:pull/24566 PR: https://git.openjdk.org/jdk/pull/24566 From ihse at openjdk.org Thu Apr 10 10:18:13 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 10 Apr 2025 10:18:13 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 10:10:49 GMT, Magnus Ihse Bursie wrote: > I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. > > BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. > > Methodology used: > > I have run four different tools for using different heuristics for determining the encoding of a file: > * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) > * uchardet (a modern version by freedesktop, used by e.g. Firefox) > * enca (targeted towards obscure code pages) > * libmagic / `file --mime-encoding` > > They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: > * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` > > From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: > * All files where at least one tool claimed it to be UTF-8 > * All files where at least one tool claimed it to be *not* UTF-8 > > For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. > > For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure binary files, but without any telling exten... src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp line 497: > 495: /* > 496: The algorithm below is based on Intel publication: > 497: "Fast SHA-256 Implementations on Intel(R) Architecture Processors" by Jim Guilford, Kirk Yap and Vinodh Gopal. Note: There is of course a unicode `?` symbol, which is what it was originally before it was botched here, but I found no reason to keep this, and in the spirit of JDK-8354213, I thought it better to use pure ASCII here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24566#discussion_r2037012318 From kbarrett at openjdk.org Thu Apr 10 10:24:36 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 10 Apr 2025 10:24:36 GMT Subject: RFR: 8324686: Remove redefinition of NULL for MSVC In-Reply-To: <5dJylnk5OQrxHxiPNmj722EwNs4OlzuW3GX2YLD_ThA=.94dd40d3-0181-45a0-a675-0ef6f3c61f33@github.com> References: <5dJylnk5OQrxHxiPNmj722EwNs4OlzuW3GX2YLD_ThA=.94dd40d3-0181-45a0-a675-0ef6f3c61f33@github.com> Message-ID: On Wed, 9 Apr 2025 07:52:44 GMT, Aleksey Shipilev wrote: >> Please review this change that removes the redefinition of NULL in >> globalDefinitions_visCPP.hpp. That redefinition was to support the use of NULL >> in a varargs context, because of the size difference for int vs a pointer. >> However, we no longer have any direct uses of NULL in HotSpot, and have a test >> that ensures there is no backsliding. >> >> There may be indirect uses of NULL via third-party libraries. Such uses could >> have been in the scope of the removed redefinition. But those uses must have >> been correct even without the redefinition, else they would be incorrect for >> non-HotSpot users. >> >> Testing: mach5 tier1-3, GHA sanity tests > > Looks fine. So, just to be extra clear, this would only affect Hotspot, not JDK. There are no interesting hits for `NULL`-s right now in Hotspot code. There are still lots of `NULL`-s in JDK native code. Thanks for reviews @shipilev and @dholmes-ora . ------------- PR Comment: https://git.openjdk.org/jdk/pull/24537#issuecomment-2792271505 From kbarrett at openjdk.org Thu Apr 10 10:24:36 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 10 Apr 2025 10:24:36 GMT Subject: Integrated: 8324686: Remove redefinition of NULL for MSVC In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 06:16:18 GMT, Kim Barrett wrote: > Please review this change that removes the redefinition of NULL in > globalDefinitions_visCPP.hpp. That redefinition was to support the use of NULL > in a varargs context, because of the size difference for int vs a pointer. > However, we no longer have any direct uses of NULL in HotSpot, and have a test > that ensures there is no backsliding. > > There may be indirect uses of NULL via third-party libraries. Such uses could > have been in the scope of the removed redefinition. But those uses must have > been correct even without the redefinition, else they would be incorrect for > non-HotSpot users. > > Testing: mach5 tier1-3, GHA sanity tests This pull request has now been integrated. Changeset: 6c266701 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/6c2667018a49ac78c3a01dc4d52ff6cdf39b7647 Stats: 21 lines in 2 files changed: 0 ins; 20 del; 1 mod 8324686: Remove redefinition of NULL for MSVC Reviewed-by: shade, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/24537 From mli at openjdk.org Thu Apr 10 10:42:12 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 10 Apr 2025 10:42:12 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > On riscv, CMoveI/L already were implemented, but there are some gap: > 1. CMoveI/L does not support comparison with float/double, corresponding tests are not turn on either. > 2. Some optimization of C2 is not turned on, e.g. `Phi -> CMove -> min_max`. > 3. lack of some corresponding performance tests. > > Also there are some issue with current Zicond: > 1. UseZicond is turned on automatically by hwprobe, but jmh tests show that it's not always bring benefit, in some situation it even brings regression, the reason is the generated code by Zicond is much larger than branch version, in particular when it's in a loop and unrolled. > > This patch on riscv is to: > 1. add CMoveI/L comparing float/double, and corresponding tests, > 2. enable more C2 optimization, > 3. add more benchmark tests, > 4. turn off UseZicond by default. > > Thanks! > > ## Performance > > ### MinMax > test data > > Benchmark | Mode | Cnt | Score - master | Score - master+UseZbb | Score - -master+UseZicond | Score - master+UseZicond+UseZbb | Score - cmovei | Score - cmovei+UseZbb | Score - cmovei+UseZicond | Score - cmovei+UseZicond+UseZbb | Error | Units | Opt (master/cmovei) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.vm.compiler.IfMinMax.testReductionInt | avgt | 40 | 17152.075 | 17216.592 | 17272.493 | 17296.89 | 17127.844 | 17036.605 | 17299.333 | 17250.566 | 73.179 | ns/op | 1.001 > o.o.b.vm.compiler.IfMinMax.testReductionLong | avgt | 40 | 19770.828 | 19967.578 | 20268.905 | 20166.165 | 20065.552 | 20059.095 | 20161.914 | 20151.295 | 131.428 | ns/op | 0.985 > o.o.b.vm.compiler.IfMinMax.testSingleInt | avgt | 40 | 114.734 | 114.402 | 114.887 | 114.384 | 114.4 | 110.631 | 112.162 | 110.915 | 0.333 | ns/op | 1.003 > o.o.b.vm.compiler.IfMinMax.testSingleLong | avgt | 40 | 121.53 | 121.711 | 120.91 | 121.665 | 121.309 | 120.57 | 118.639 | 119.373 | 0.451 | ns/op | 1.002 > o.o.b.vm.compiler.IfMinMax.testVectorInt | avgt | 40 | 60130.165 | 60062.303 | 61839.776 | 61895.194 | 15887.398 | 15924.502 | 15874.835 | 15667.936 | 101.94 | ns/op | 3.785 > o.o.b.vm.compiler.IfMinMax.testVectorLong | avgt | 40 | 63855.379 | 6309... Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: - test - adjust the way enabling Zicond ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24490/files - new: https://git.openjdk.org/jdk/pull/24490/files/10f9adb0..bff391d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24490&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24490&range=01-02 Stats: 11 lines in 3 files changed: 4 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24490/head:pull/24490 PR: https://git.openjdk.org/jdk/pull/24490 From mli at openjdk.org Thu Apr 10 10:42:12 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 10 Apr 2025 10:42:12 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L [v3] In-Reply-To: References: Message-ID: On Wed, 9 Apr 2025 06:52:26 GMT, Fei Yang wrote: >> This is to not enable Zicond automatically, but user can still turn it on manually if they want to try or make sure it bring benefit on the specific hardware. >> Currently it's based on bananapi result, so maybe in the future we should adjust the default value of UseZicond. >> I'm fine with either default value. > > I just witnessed a couple of warnings (`UseZicond is turned off automatically. Turn it on with -XX:+UseZicond explicitly.`) when doing a native build on my P550 SBC which is not equipped with `Zicond` extension. I don't think that is expected? And I agree that it might be better to keep this option disabled by default and let users decide whether to enable it based on their cases. But what I see is that `UseZicond` will be auto-enabled through hwprobe [1] on my BPI-F3. So I am suggesting to not to do that in my previous comment. Make sense? > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp#L228 Fixed. Also adjust the way enable Zicond as we just discussed. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24490#discussion_r2037051448 From mli at openjdk.org Thu Apr 10 10:50:07 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 10 Apr 2025 10:50:07 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L [v4] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > On riscv, CMoveI/L already were implemented, but there are some gap: > 1. CMoveI/L does not support comparison with float/double, corresponding tests are not turn on either. > 2. Some optimization of C2 is not turned on, e.g. `Phi -> CMove -> min_max`. > 3. lack of some corresponding performance tests. > > Also there are some issue with current Zicond: > 1. UseZicond is turned on automatically by hwprobe, but jmh tests show that it's not always bring benefit, in some situation it even brings regression, the reason is the generated code by Zicond is much larger than branch version, in particular when it's in a loop and unrolled. > > This patch on riscv is to: > 1. add CMoveI/L comparing float/double, and corresponding tests, > 2. enable more C2 optimization, > 3. add more benchmark tests, > 4. turn off UseZicond by default. > > Thanks! > > ## Performance > > ### MinMax > test data > > Benchmark | Mode | Cnt | Score - master | Score - master+UseZbb | Score - -master+UseZicond | Score - master+UseZicond+UseZbb | Score - cmovei | Score - cmovei+UseZbb | Score - cmovei+UseZicond | Score - cmovei+UseZicond+UseZbb | Error | Units | Opt (master/cmovei) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.vm.compiler.IfMinMax.testReductionInt | avgt | 40 | 17152.075 | 17216.592 | 17272.493 | 17296.89 | 17127.844 | 17036.605 | 17299.333 | 17250.566 | 73.179 | ns/op | 1.001 > o.o.b.vm.compiler.IfMinMax.testReductionLong | avgt | 40 | 19770.828 | 19967.578 | 20268.905 | 20166.165 | 20065.552 | 20059.095 | 20161.914 | 20151.295 | 131.428 | ns/op | 0.985 > o.o.b.vm.compiler.IfMinMax.testSingleInt | avgt | 40 | 114.734 | 114.402 | 114.887 | 114.384 | 114.4 | 110.631 | 112.162 | 110.915 | 0.333 | ns/op | 1.003 > o.o.b.vm.compiler.IfMinMax.testSingleLong | avgt | 40 | 121.53 | 121.711 | 120.91 | 121.665 | 121.309 | 120.57 | 118.639 | 119.373 | 0.451 | ns/op | 1.002 > o.o.b.vm.compiler.IfMinMax.testVectorInt | avgt | 40 | 60130.165 | 60062.303 | 61839.776 | 61895.194 | 15887.398 | 15924.502 | 15874.835 | 15667.936 | 101.94 | ns/op | 3.785 > o.o.b.vm.compiler.IfMinMax.testVectorLong | avgt | 40 | 63855.379 | 6309... Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - merge master - test - adjust the way enabling Zicond - typo - Merge branch 'master' into cmoveil-v1 - turn off flag Zicond by default - remove - initial commit ------------- Changes: https://git.openjdk.org/jdk/pull/24490/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24490&range=03 Stats: 952 lines in 17 files changed: 911 ins; 10 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/24490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24490/head:pull/24490 PR: https://git.openjdk.org/jdk/pull/24490 From mli at openjdk.org Thu Apr 10 10:59:54 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 10 Apr 2025 10:59:54 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L [v5] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > On riscv, CMoveI/L already were implemented, but there are some gap: > 1. CMoveI/L does not support comparison with float/double, corresponding tests are not turn on either. > 2. Some optimization of C2 is not turned on, e.g. `Phi -> CMove -> min_max`. > 3. lack of some corresponding performance tests. > > Also there are some issue with current Zicond: > 1. UseZicond is turned on automatically by hwprobe, but jmh tests show that it's not always bring benefit, in some situation it even brings regression, the reason is the generated code by Zicond is much larger than branch version, in particular when it's in a loop and unrolled. > > This patch on riscv is to: > 1. add CMoveI/L comparing float/double, and corresponding tests, > 2. enable more C2 optimization, > 3. add more benchmark tests, > 4. turn off UseZicond by default. > > Thanks! > > ## Performance > > ### MinMax > test data > > Benchmark | Mode | Cnt | Score - master | Score - master+UseZbb | Score - -master+UseZicond | Score - master+UseZicond+UseZbb | Score - cmovei | Score - cmovei+UseZbb | Score - cmovei+UseZicond | Score - cmovei+UseZicond+UseZbb | Error | Units | Opt (master/cmovei) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.vm.compiler.IfMinMax.testReductionInt | avgt | 40 | 17152.075 | 17216.592 | 17272.493 | 17296.89 | 17127.844 | 17036.605 | 17299.333 | 17250.566 | 73.179 | ns/op | 1.001 > o.o.b.vm.compiler.IfMinMax.testReductionLong | avgt | 40 | 19770.828 | 19967.578 | 20268.905 | 20166.165 | 20065.552 | 20059.095 | 20161.914 | 20151.295 | 131.428 | ns/op | 0.985 > o.o.b.vm.compiler.IfMinMax.testSingleInt | avgt | 40 | 114.734 | 114.402 | 114.887 | 114.384 | 114.4 | 110.631 | 112.162 | 110.915 | 0.333 | ns/op | 1.003 > o.o.b.vm.compiler.IfMinMax.testSingleLong | avgt | 40 | 121.53 | 121.711 | 120.91 | 121.665 | 121.309 | 120.57 | 118.639 | 119.373 | 0.451 | ns/op | 1.002 > o.o.b.vm.compiler.IfMinMax.testVectorInt | avgt | 40 | 60130.165 | 60062.303 | 61839.776 | 61895.194 | 15887.398 | 15924.502 | 15874.835 | 15667.936 | 101.94 | ns/op | 3.785 > o.o.b.vm.compiler.IfMinMax.testVectorLong | avgt | 40 | 63855.379 | 6309... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: revert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24490/files - new: https://git.openjdk.org/jdk/pull/24490/files/74a93f02..da64a160 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24490&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24490&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24490/head:pull/24490 PR: https://git.openjdk.org/jdk/pull/24490 From tschatzl at openjdk.org Thu Apr 10 11:01:42 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 10 Apr 2025 11:01:42 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> Message-ID: On Wed, 9 Apr 2025 12:48:10 GMT, Thomas Schatzl wrote: >> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 101: >> >>> 99: } >>> 100: >>> 101: void G1BarrierSetAssembler::gen_write_ref_array_post_barrier(MacroAssembler* masm, DecoratorSet decorators, >> >> Have you measured the performance impact of inlining this assembly code instead of resorting to a runtime call as done before? Is it worth the maintenance cost (for every platform), risk of introducing bugs, etc.? > > I remember significant impact in some microbenchmark. It's also inlined in Parallel GC. I do not consider it a big issue wrt to maintenance - these things never really change, and the method is small and contained. > I will try to redo numbers. >From our microbenchmarks (higher numbers are better): Current code: Benchmark (size) Mode Cnt Score Error Units ArrayCopyObject.conjoint_micro 31 thrpt 15 166136.959 ? 5517.157 ops/ms ArrayCopyObject.conjoint_micro 63 thrpt 15 108880.108 ? 4331.112 ops/ms ArrayCopyObject.conjoint_micro 127 thrpt 15 93159.977 ? 5025.458 ops/ms ArrayCopyObject.conjoint_micro 2047 thrpt 15 17234.842 ? 831.344 ops/ms ArrayCopyObject.conjoint_micro 4095 thrpt 15 9202.216 ? 292.612 ops/ms ArrayCopyObject.conjoint_micro 8191 thrpt 15 3565.705 ? 121.116 ops/ms ArrayCopyObject.disjoint_micro 31 thrpt 15 159106.245 ? 5965.576 ops/ms ArrayCopyObject.disjoint_micro 63 thrpt 15 95475.658 ? 5415.267 ops/ms ArrayCopyObject.disjoint_micro 127 thrpt 15 84249.979 ? 6313.007 ops/ms ArrayCopyObject.disjoint_micro 2047 thrpt 15 10682.650 ? 381.832 ops/ms ArrayCopyObject.disjoint_micro 4095 thrpt 15 4471.940 ? 216.439 ops/ms ArrayCopyObject.disjoint_micro 8191 thrpt 15 1378.296 ? 33.421 ops/ms ArrayCopy.arrayCopyObject N/A avgt 15 13.880 ? 0.517 ns/op ArrayCopy.arrayCopyObjectNonConst N/A avgt 15 14.844 ? 0.751 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward N/A avgt 15 11.080 ? 0.703 ns/op ArrayCopy.arrayCopyObjectSameArraysForward N/A avgt 15 11.003 ? 0.135 ns/op Runtime call: Benchmark (size) Mode Cnt Score Error Units ArrayCopyObject.conjoint_micro 31 thrpt 15 73100.230 ? 11079.381 ops/ms ArrayCopyObject.conjoint_micro 63 thrpt 15 65039.431 ? 1996.832 ops/ms ArrayCopyObject.conjoint_micro 127 thrpt 15 58336.711 ? 2260.660 ops/ms ArrayCopyObject.conjoint_micro 2047 thrpt 15 17035.419 ? 524.445 ops/ms ArrayCopyObject.conjoint_micro 4095 thrpt 15 9207.661 ? 286.526 ops/ms ArrayCopyObject.conjoint_micro 8191 thrpt 15 3264.491 ? 73.848 ops/ms ArrayCopyObject.disjoint_micro 31 thrpt 15 84587.219 ? 3007.310 ops/ms ArrayCopyObject.disjoint_micro 63 thrpt 15 62815.254 ? 1214.310 ops/ms ArrayCopyObject.disjoint_micro 127 thrpt 15 58423.470 ? 285.670 ops/ms ArrayCopyObject.disjoint_micro 2047 thrpt 15 10720.462 ? 617.173 ops/ms ArrayCopyObject.disjoint_micro 4095 thrpt 15 4178.195 ? 178.942 ops/ms ArrayCopyObject.disjoint_micro 8191 thrpt 15 1374.268 ? 44.290 ops/ms ArrayCopy.arrayCopyObject N/A avgt 15 19.667 ? 0.740 ns/op ArrayCopy.arrayCopyObjectNonConst N/A avgt 15 21.243 ? 1.891 ns/op ArrayCopy.arrayCopyObjectSameArraysBackward N/A avgt 15 16.645 ? 0.504 ns/op ArrayCopy.arrayCopyObjectSameArraysForward N/A avgt 15 17.409 ? 0.705 ns/op Obviously with larger arrays, the impact diminishes, but it's always there. I think the inlined code is worth the effort in this case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2037086410 From jsjolen at openjdk.org Thu Apr 10 11:01:53 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 10 Apr 2025 11:01:53 GMT Subject: RFR: 8344883: Force clients to explicitly pass mem_tag value, even if it is mtNone [v9] In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 13:30:44 GMT, Gerard Ziemski wrote: >> This is a follow-up to #21843. Here we are focusing on removing the mem tag paremeter with default value of mtNone, to force everyone to provide mem tag, if known. >> >> I tried to fill in tag, when I was pretty certain that I had the right type. >> >> At least one more follow-up will be needed after this, to change the remaining mtNone to valid values. > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > small last feedback from Stefan Hi, This looks good to me, thank you for doing this! ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24282#pullrequestreview-2756264572 From mli at openjdk.org Thu Apr 10 11:12:51 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 10 Apr 2025 11:12:51 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L [v6] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > On riscv, CMoveI/L already were implemented, but there are some gap: > 1. CMoveI/L does not support comparison with float/double, corresponding tests are not turn on either. > 2. Some optimization of C2 is not turned on, e.g. `Phi -> CMove -> min_max`. > 3. lack of some corresponding performance tests. > > Also there are some issue with current Zicond: > 1. UseZicond is turned on automatically by hwprobe, but jmh tests show that it's not always bring benefit, in some situation it even brings regression, the reason is the generated code by Zicond is much larger than branch version, in particular when it's in a loop and unrolled. > > This patch on riscv is to: > 1. add CMoveI/L comparing float/double, and corresponding tests, > 2. enable more C2 optimization, > 3. add more benchmark tests, > 4. turn off UseZicond by default. > > Thanks! > > ## Performance > > ### MinMax > test data > > Benchmark | Mode | Cnt | Score - master | Score - master+UseZbb | Score - -master+UseZicond | Score - master+UseZicond+UseZbb | Score - cmovei | Score - cmovei+UseZbb | Score - cmovei+UseZicond | Score - cmovei+UseZicond+UseZbb | Error | Units | Opt (master/cmovei) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.vm.compiler.IfMinMax.testReductionInt | avgt | 40 | 17152.075 | 17216.592 | 17272.493 | 17296.89 | 17127.844 | 17036.605 | 17299.333 | 17250.566 | 73.179 | ns/op | 1.001 > o.o.b.vm.compiler.IfMinMax.testReductionLong | avgt | 40 | 19770.828 | 19967.578 | 20268.905 | 20166.165 | 20065.552 | 20059.095 | 20161.914 | 20151.295 | 131.428 | ns/op | 0.985 > o.o.b.vm.compiler.IfMinMax.testSingleInt | avgt | 40 | 114.734 | 114.402 | 114.887 | 114.384 | 114.4 | 110.631 | 112.162 | 110.915 | 0.333 | ns/op | 1.003 > o.o.b.vm.compiler.IfMinMax.testSingleLong | avgt | 40 | 121.53 | 121.711 | 120.91 | 121.665 | 121.309 | 120.57 | 118.639 | 119.373 | 0.451 | ns/op | 1.002 > o.o.b.vm.compiler.IfMinMax.testVectorInt | avgt | 40 | 60130.165 | 60062.303 | 61839.776 | 61895.194 | 15887.398 | 15924.502 | 15874.835 | 15667.936 | 101.94 | ns/op | 3.785 > o.o.b.vm.compiler.IfMinMax.testVectorLong | avgt | 40 | 63855.379 | 6309... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: enable more IR tests depending enabling CMove ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24490/files - new: https://git.openjdk.org/jdk/pull/24490/files/da64a160..53603ec7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24490&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24490&range=04-05 Stats: 8 lines in 2 files changed: 0 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24490/head:pull/24490 PR: https://git.openjdk.org/jdk/pull/24490 From rcastanedalo at openjdk.org Thu Apr 10 11:22:36 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 10 Apr 2025 11:22:36 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30] In-Reply-To: References: <8noWoU1cd2y4EjjK3QZGMLacPC9gkrwn5Ns3XbQbppI=.74de0b05-b8da-417f-8096-de98d7a3d815@github.com> Message-ID: On Thu, 10 Apr 2025 10:58:24 GMT, Thomas Schatzl wrote: >> I remember significant impact in some microbenchmark. It's also inlined in Parallel GC. I do not consider it a big issue wrt to maintenance - these things never really change, and the method is small and contained. >> I will try to redo numbers. > > From our microbenchmarks (higher numbers are better): > > Current code: > > Benchmark (size) Mode Cnt Score Error Units > ArrayCopyObject.conjoint_micro 31 thrpt 15 166136.959 ? 5517.157 ops/ms > ArrayCopyObject.conjoint_micro 63 thrpt 15 108880.108 ? 4331.112 ops/ms > ArrayCopyObject.conjoint_micro 127 thrpt 15 93159.977 ? 5025.458 ops/ms > ArrayCopyObject.conjoint_micro 2047 thrpt 15 17234.842 ? 831.344 ops/ms > ArrayCopyObject.conjoint_micro 4095 thrpt 15 9202.216 ? 292.612 ops/ms > ArrayCopyObject.conjoint_micro 8191 thrpt 15 3565.705 ? 121.116 ops/ms > ArrayCopyObject.disjoint_micro 31 thrpt 15 159106.245 ? 5965.576 ops/ms > ArrayCopyObject.disjoint_micro 63 thrpt 15 95475.658 ? 5415.267 ops/ms > ArrayCopyObject.disjoint_micro 127 thrpt 15 84249.979 ? 6313.007 ops/ms > ArrayCopyObject.disjoint_micro 2047 thrpt 15 10682.650 ? 381.832 ops/ms > ArrayCopyObject.disjoint_micro 4095 thrpt 15 4471.940 ? 216.439 ops/ms > ArrayCopyObject.disjoint_micro 8191 thrpt 15 1378.296 ? 33.421 ops/ms > ArrayCopy.arrayCopyObject N/A avgt 15 13.880 ? 0.517 ns/op > ArrayCopy.arrayCopyObjectNonConst N/A avgt 15 14.844 ? 0.751 ns/op > ArrayCopy.arrayCopyObjectSameArraysBackward N/A avgt 15 11.080 ? 0.703 ns/op > ArrayCopy.arrayCopyObjectSameArraysForward N/A avgt 15 11.003 ? 0.135 ns/op > > Runtime call: > > Benchmark (size) Mode Cnt Score Error Units > ArrayCopyObject.conjoint_micro 31 thrpt 15 73100.230 ? 11079.381 ops/ms > ArrayCopyObject.conjoint_micro 63 thrpt 15 65039.431 ? 1996.832 ops/ms > ArrayCopyObject.conjoint_micro 127 thrpt 15 58336.711 ? 2260.660 ops/ms > ArrayCopyObject.conjoint_micro 2047 thrpt 15 17035.419 ? 524.445 ops/ms > ArrayCopyObject.conjoint_micro 4095 thrpt 15 9207.661 ? 286.526 ops/ms > ArrayCopyObject.conjoint_micro 8191 thrpt 15 3264.491 ? 73.848 ops/ms > ArrayCopyObject.disjoint_micro 31 thrpt 15 84587.219 ? 3007.310 ops/ms > ArrayCopyObject.disjoint_micro ... Fair enough, thanks for the measurements! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2037121277 From jsikstro at openjdk.org Thu Apr 10 11:40:51 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 10 Apr 2025 11:40:51 GMT Subject: RFR: 8350441: ZGC: Overhaul Page Allocation [v2] In-Reply-To: References: Message-ID: <2lZljOaRU4QYWP-iNYwwFNazf7Z6Jh9MrYxc4QSsNmE=.cae1bf9f-c0af-4c82-ac4b-2ca7a401f6c0@github.com> On Thu, 10 Apr 2025 09:20:58 GMT, Joel Sikstr?m wrote: >>> Note that any reference to pages from here on out refers to the concept of a heap region in ZGC, not pages in the operating system (OS), unless stated otherwise. >> >> # Background >> >> This PR addresses fragmentation by introducing a Mapped Cache that replaces the Page Cache in ZGC. The largest limitation of the Page Cache is that it is constrained by the abstraction of what a page is. The proposed Mapped Cache removes this limitation by decoupling memory from pages, allowing it to merge and split memory in ways that the Page Cache is not suited for. To facilitate the transition, much of the Page Allocator has been redesigned to work with the Mapped Cache. >> >> In addition to fighting fragmentation, the new approach improves NUMA-support and simplifies memory unampping. Combined, these changes lay the foundation for even more improvements in ZGC, like replacing multi-mapped memory with anonymous memory. >> >> # Mapped Cache >> >> The main benefit of the Mapped Cache is that adjacent virtual memory ranges in the cache can be merged to create larger ranges, enabling larger allocation requests to succeed more easily. Most notably, it allows allocations to succeed more often without "harvesting" smaller, discontiguous ranges. Harvesting negatively impacts both fragmentation and latency, as it requires remapping memory into a new contiguous virtual address range. Fragmentation becomes especially problematic in long-running programs and in environments with limited address space, where finding large contiguous regions can be difficult and may lead to premature Out Of Memory Errors (OOME). >> >> The Mapped Cache uses a self-balancing binary search tree to store memory ranges. Since the ranges are unused when inside the cache, the tree can use this memory to store metadata about itself, referred to as intrusive storage. This approach eliminates the need for dynamic memory allocation (e.g., malloc), which could otherwise introduce a latency overhead. >> >> # Fragmentation >> >> Currently, ZGC has multiple strategies for dealing with fragmentation. In some edge cases, these strategies are not as efficient as we would like. By addressing fragmentation differently with the Mapped Cache, ZGC is in a better position to avoid edge cases, which are bad even if they occur only once. This is especially impactful for programs running with a large heap. >> >> ## Virtual Memory Shuffling >> >> In addition to the Mapped Cache, we have made some adjustments in how ZGC deals with vir... > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/gc/z/zVirtualMemoryManager.hpp > > Co-authored-by: Axel Boldt-Christmas Thank you to everyone who contributed to, helped out, and reviewed this patch. I am really happy with how this turned out and all the things I've learned along the way :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24547#issuecomment-2792450995 From jsikstro at openjdk.org Thu Apr 10 11:40:51 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 10 Apr 2025 11:40:51 GMT Subject: Integrated: 8350441: ZGC: Overhaul Page Allocation In-Reply-To: References: Message-ID: <5hnAuZLpGYPMwR594K0ftHzhCa4PWvwrVMEneZnJ9v4=.f894e7f2-8733-49ae-a875-0e1a6b5ff075@github.com> On Wed, 9 Apr 2025 13:37:16 GMT, Joel Sikstr?m wrote: >> Note that any reference to pages from here on out refers to the concept of a heap region in ZGC, not pages in the operating system (OS), unless stated otherwise. > > # Background > > This PR addresses fragmentation by introducing a Mapped Cache that replaces the Page Cache in ZGC. The largest limitation of the Page Cache is that it is constrained by the abstraction of what a page is. The proposed Mapped Cache removes this limitation by decoupling memory from pages, allowing it to merge and split memory in ways that the Page Cache is not suited for. To facilitate the transition, much of the Page Allocator has been redesigned to work with the Mapped Cache. > > In addition to fighting fragmentation, the new approach improves NUMA-support and simplifies memory unampping. Combined, these changes lay the foundation for even more improvements in ZGC, like replacing multi-mapped memory with anonymous memory. > > # Mapped Cache > > The main benefit of the Mapped Cache is that adjacent virtual memory ranges in the cache can be merged to create larger ranges, enabling larger allocation requests to succeed more easily. Most notably, it allows allocations to succeed more often without "harvesting" smaller, discontiguous ranges. Harvesting negatively impacts both fragmentation and latency, as it requires remapping memory into a new contiguous virtual address range. Fragmentation becomes especially problematic in long-running programs and in environments with limited address space, where finding large contiguous regions can be difficult and may lead to premature Out Of Memory Errors (OOME). > > The Mapped Cache uses a self-balancing binary search tree to store memory ranges. Since the ranges are unused when inside the cache, the tree can use this memory to store metadata about itself, referred to as intrusive storage. This approach eliminates the need for dynamic memory allocation (e.g., malloc), which could otherwise introduce a latency overhead. > > # Fragmentation > > Currently, ZGC has multiple strategies for dealing with fragmentation. In some edge cases, these strategies are not as efficient as we would like. By addressing fragmentation differently with the Mapped Cache, ZGC is in a better position to avoid edge cases, which are bad even if they occur only once. This is especially impactful for programs running with a large heap. > > ## Virtual Memory Shuffling > > In addition to the Mapped Cache, we have made some adjustments in how ZGC deals with virtual memory. When harvesting memory, whic... This pull request has now been integrated. Changeset: 7e69b98e Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/7e69b98e0548803b85b04b518929c073f8ffaf8c Stats: 12052 lines in 118 files changed: 7936 ins; 3218 del; 898 mod 8350441: ZGC: Overhaul Page Allocation Co-authored-by: Axel Boldt-Christmas Co-authored-by: Erik ?sterlund Co-authored-by: Stefan Karlsson Co-authored-by: Stefan Johansson Reviewed-by: stefank, aboldtch, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/24547 From rvansa at openjdk.org Thu Apr 10 11:43:40 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Thu, 10 Apr 2025 11:43:40 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization [v6] In-Reply-To: <8NMGLWo0iN67IorkWm4_LIeuwhHtBNoW_VlKyt-HfL0=.fe484f48-7be0-4302-9637-14dbc1c15a4c@github.com> References: <8NMGLWo0iN67IorkWm4_LIeuwhHtBNoW_VlKyt-HfL0=.fe484f48-7be0-4302-9637-14dbc1c15a4c@github.com> Message-ID: On Thu, 10 Apr 2025 08:38:35 GMT, Aleksey Shipilev wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove outdated comment > > src/hotspot/share/oops/instanceKlass.cpp line 1945: > >> 1943: fields_sorted.sort(compare_fields_by_offset); >> 1944: fieldDescriptor fd; >> 1945: for (auto it = fields_sorted.begin(); it != fields_sorted.end(); ++it) { > > Ah, that's not what I meant :) There is no need to use iterators here, just pull `length` out of `fields_stored.length()`, and use the same old indexed loop: > > > int length = fields_stored.length(); > if (length > 0) { > fields_sorted.sort(compare_fields_by_offset); > for (int i = 0; i < length; i++) { > ... I know you haven't asked specifically for iterators, but to me this seems 'the' cleaner way, rather than plain old indexed loop. Or do you have doubts about this being inlined to effectively the same code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24290#discussion_r2037152642 From rgiulietti at openjdk.org Thu Apr 10 11:49:30 2025 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Thu, 10 Apr 2025 11:49:30 GMT Subject: RFR: 8354266: Fix non-UTF-8 text encoding In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 10:14:40 GMT, Magnus Ihse Bursie wrote: >> I have checked the entire code base for incorrect encodings, but luckily enough these were the only remaining problems I found. >> >> BOM (byte-order mark) is a method used for distinguishing big and little endian UTF-16 encodings. There is a special UTF-8 BOM, but it is discouraged. In the words of the Unicode Consortium: "Use of a BOM is neither required nor recommended for UTF-8". We have UTF-8 BOMs in a handful of files. These should be removed. >> >> Methodology used: >> >> I have run four different tools for using different heuristics for determining the encoding of a file: >> * chardetect (the original, slow-as-molasses Perl program, which also had the worst performing heuristics of all; I'll rate it 1/5) >> * uchardet (a modern version by freedesktop, used by e.g. Firefox) >> * enca (targeted towards obscure code pages) >> * libmagic / `file --mime-encoding` >> >> They all agreed on pure ASCII files (which is easy to check), and these I just ignored/accepted as good. The handling of pure binary files differed between the tools; most detected them as binary but some suggested arcane encodings for specific (often small) binary files. To keep my sanity, I decided that files ending in any of these extensions were binary, and I did not check them further: >> * `gif|png|ico|jpg|icns|tiff|wav|woff|woff2|jar|ttf|bmp|class|crt|jks|keystore|ks|db` >> >> From the remaining list of non-ascii, non-known-binary files I selected two overlapping and exhaustive subsets: >> * All files where at least one tool claimed it to be UTF-8 >> * All files where at least one tool claimed it to be *not* UTF-8 >> >> For the first subset, I checked every non-ASCII character (using `C_ALL=C ggrep -H --color='auto' -P -n "[^\x00-\x7F]" $(cat names-of-files-to-check.txt)`, and visually examining the results). At this stage, I found several files where unicode were unnecessarily used instead of pure ASCII, and I treated those files separately. Other from that, my inspection revealed no obvious encoding errors. This list comprised of about 2000 files, so I did not spend too much time on each file. The assumption, after all, was that these files are okay. >> >> For the second subset, I checked every non-ASCII character (using the same method). This list was about 300+ files. Most of them were okay far as I can tell; I can confirm encodings for European languages 100%, but JCK encodings could theoretically be wrong; they looked sane but I cannot read and confirm fully. Several were in fact pure... > > src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp line 497: > >> 495: /* >> 496: The algorithm below is based on Intel publication: >> 497: "Fast SHA-256 Implementations on Intel(R) Architecture Processors" by Jim Guilford, Kirk Yap and Vinodh Gopal. > > Note: There is of course a unicode `?` symbol, which is what it was originally before it was botched here, but I found no reason to keep this, and in the spirit of JDK-8354213, I thought it better to use pure ASCII here. I guess the difference at L.1 in the various files is just the BOM? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24566#discussion_r2037161789 From jsjolen at openjdk.org Thu Apr 10 12:00:37 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 10 Apr 2025 12:00:37 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization [v6] In-Reply-To: References: <8NMGLWo0iN67IorkWm4_LIeuwhHtBNoW_VlKyt-HfL0=.fe484f48-7be0-4302-9637-14dbc1c15a4c@github.com> Message-ID: On Thu, 10 Apr 2025 11:41:17 GMT, Radim Vansa wrote: >> src/hotspot/share/oops/instanceKlass.cpp line 1945: >> >>> 1943: fields_sorted.sort(compare_fields_by_offset); >>> 1944: fieldDescriptor fd; >>> 1945: for (auto it = fields_sorted.begin(); it != fields_sorted.end(); ++it) { >> >> Ah, that's not what I meant :) There is no need to use iterators here, just pull `length` out of `fields_stored.length()`, and use the same old indexed loop: >> >> >> int length = fields_stored.length(); >> if (length > 0) { >> fields_sorted.sort(compare_fields_by_offset); >> for (int i = 0; i < length; i++) { >> ... > > I know you haven't asked specifically for iterators, but to me this seems 'the' cleaner way, rather than plain old indexed loop. Or do you have doubts about this being inlined to effectively the same code? I think using an iterator is fine, that's up to you. I am a bit bothered by having this change also slip in a change to the `GrowableArray` API, however. Anyone who wants to review that part of the code will miss that change. I'd like to suggest that you change to an indexed for loop, and you make a follow-up PR with your GA changes and where you switch the loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24290#discussion_r2037179686 From shade at openjdk.org Thu Apr 10 12:07:34 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 10 Apr 2025 12:07:34 GMT Subject: RFR: 8353175: Eliminate double iteration of stream in FieldDescriptor reinitialization [v6] In-Reply-To: References: <8NMGLWo0iN67IorkWm4_LIeuwhHtBNoW_VlKyt-HfL0=.fe484f48-7be0-4302-9637-14dbc1c15a4c@github.com> Message-ID: On Thu, 10 Apr 2025 11:58:09 GMT, Johan Sj?len wrote: >> I know you haven't asked specifically for iterators, but to me this seems 'the' cleaner way, rather than plain old indexed loop. Or do you have doubts about this being inlined to effectively the same code? > > I think using an iterator is fine, that's up to you. I am a bit bothered by having this change also slip in a change to the `GrowableArray` API, however. Anyone who wants to review that part of the code will miss that change. > > I'd like to suggest that you change to an indexed for loop, and you make a follow-up PR with your GA changes and where you switch the loop. Yeah, it is not about cleanliness, but rather about doing one (good) thing at a time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24290#discussion_r2037189271 From mli at openjdk.org Thu Apr 10 12:08:29 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 10 Apr 2025 12:08:29 GMT Subject: RFR: 8352504: RISC-V: implement and enable CMoveI/L [v7] In-Reply-To: References: Message-ID: <317DYBGa3NAUfowhgTFeq8Crezuz-mV9io6nW1IXccc=.d37f6b47-ac06-4234-9162-8aca891630d7@github.com> > Hi, > Can you help to review this patch? > On riscv, CMoveI/L already were implemented, but there are some gap: > 1. CMoveI/L does not support comparison with float/double, corresponding tests are not turn on either. > 2. Some optimization of C2 is not turned on, e.g. `Phi -> CMove -> min_max`. > 3. lack of some corresponding performance tests. > > Also there are some issue with current Zicond: > 1. UseZicond is turned on automatically by hwprobe, but jmh tests show that it's not always bring benefit, in some situation it even brings regression, the reason is the generated code by Zicond is much larger than branch version, in particular when it's in a loop and unrolled. > > This patch on riscv is to: > 1. add CMoveI/L comparing float/double, and corresponding tests, > 2. enable more C2 optimization, > 3. add more benchmark tests, > 4. turn off UseZicond by default. > > Thanks! > > ## Performance > > ### MinMax > test data > > Benchmark | Mode | Cnt | Score - master | Score - master+UseZbb | Score - -master+UseZicond | Score - master+UseZicond+UseZbb | Score - cmovei | Score - cmovei+UseZbb | Score - cmovei+UseZicond | Score - cmovei+UseZicond+UseZbb | Error | Units | Opt (master/cmovei) > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.vm.compiler.IfMinMax.testReductionInt | avgt | 40 | 17152.075 | 17216.592 | 17272.493 | 17296.89 | 17127.844 | 17036.605 | 17299.333 | 17250.566 | 73.179 | ns/op | 1.001 > o.o.b.vm.compiler.IfMinMax.testReductionLong | avgt | 40 | 19770.828 | 19967.578 | 20268.905 | 20166.165 | 20065.552 | 20059.095 | 20161.914 | 20151.295 | 131.428 | ns/op | 0.985 > o.o.b.vm.compiler.IfMinMax.testSingleInt | avgt | 40 | 114.734 | 114.402 | 114.887 | 114.384 | 114.4 | 110.631 | 112.162 | 110.915 | 0.333 | ns/op | 1.003 > o.o.b.vm.compiler.IfMinMax.testSingleLong | avgt | 40 | 121.53 | 121.711 | 120.91 | 121.665 | 121.309 | 120.57 | 118.639 | 119.373 | 0.451 | ns/op | 1.002 > o.o.b.vm.compiler.IfMinMax.testVectorInt | avgt | 40 | 60130.165 | 60062.303 | 61839.776 | 61895.194 | 15887.398 | 15924.502 | 15874.835 | 15667.936 | 101.94 | ns/op | 3.785 > o.o.b.vm.compiler.IfMinMax.testVectorLong | avgt | 40 | 63855.379 | 6309... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: enable more test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24490/files - new: https://git.openjdk.org/jdk/pull/24490/files/53603ec7..7860aba1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24490&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24490&range=05-06 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24490/head:pull/24490 PR: https://git.openjdk.org/jdk/pull/24490