From duke at openjdk.org Sat Feb 1 01:45:57 2025 From: duke at openjdk.org (duke) Date: Sat, 1 Feb 2025 01:45:57 GMT Subject: Withdrawn: 8242888: Convert dynamic proxy to hidden classes In-Reply-To: References: Message-ID: On Thu, 23 May 2024 03:28:30 GMT, Chen Liang wrote: > Please review this change that adds a new dynamic proxies implementation as hidden classes. > > Summary: > 1. Adds new implementation which can be `-Djdk.reflect.useHiddenProxy=true` for early adoption. > 2. ClassLoader.defineClass0 takes a ClassLoader instance but discards it in native code; I updated native code to reuse that ClassLoader for Proxy support. > 3. ProxyGenerator changes mainly involve using Class data to pass Method list (accessed in a single condy) and removal of obsolete setup code generation. > > Comment: Since #8278, Proxy has been converted to ClassFile API, and infrastructure has changed; now, the migration to hidden classes is much cleaner and has less impact, such as preserving ProtectionDomain and dynamic module without "anchor classes", and avoiding java.lang.invoke package. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19356 From dnsimon at openjdk.org Sat Feb 1 07:21:52 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 1 Feb 2025 07:21:52 GMT Subject: RFR: 8323158: HotSpot Style Guide should specify more include ordering In-Reply-To: References: Message-ID: <9RwWVrZpTqsRO5srrrT0jOt4CGc7oF5FEm06Pzjf2yI=.a5fc3070-226c-4292-9802-426c9cab1672@github.com> On Fri, 31 Jan 2025 13:56:58 GMT, Stefan Karlsson wrote: > The HotSpot Style Guide has a section about source files and includes. The style used for includes have mostly been introduced by scripts when includeDB was replaced, but also when various other enhancements to our includes were made. Some of the introduced styles were never written down in the style guide. > > I propose a couple of changes to the HotSpot Style Guide to reflect some of these implicit styles that we have. While updating the text I also took the liberty to order the items in an order that I felt was good. > > Note that JDK-8323158 contains a few more suggestions, but I've only addressed the items that I think can be accepted without much contention. Either I extract the items that have not been address into a new RFE, or I create a new RFE for this PR. > > There a some small whitespace tweaks that I made so that the .md and .html had a similar layout. A lot of these rules looks like they could be checked with some simple scripting or additions to jcheck. Have you considered that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23388#issuecomment-2628824819 From doug.simon at oracle.com Sat Feb 1 08:03:35 2025 From: doug.simon at oracle.com (Douglas Simon) Date: Sat, 1 Feb 2025 08:03:35 +0000 Subject: Proposal: Remove EnableJVMCI flag Message-ID: Hi, https://bugs.openjdk.org/browse/JDK-8345826 was filed to make libgraal and new CDS optimizations more compatible: Since JDK 483, many more CDS optimizations are enabled when -XX:+AOTClassLinking is specified (see numbers in https://bugs.openjdk.org/browse/JDK-8342279). However, these optimizations require the archived module graph to be used. Today, if you enable UseGraalJIT, the archived module graph will be disabled. As a result, the *entire* CDS archive will be disabled. This will result in slower start-up time when UseGraalJIT is enabled. Further internal discussion resulted in the proposal to remove all use of EnableJVMCI in the VM code. This will mean -XX:+EnableJVMCI only applies to the Java code (i.e. adds jdk.internal.vm.ci to the root module set). However, further reflection suggests something more aggressive is worth considering: remove the EnableJVMCI flag altogether. This option was implemented to make use of JVMCI opt-in. However, JVMCI is effectively opt-in anyway without this option. There are two ways in which JVMCI can be used: as a JIT compiler by the CompileBroker and as a compiler for ?guest? code (e.g., Truffle use case). 1. JVMCI as JIT. To enable JVMCI as JIT, flags such as UseJVMCICompiler, UseGraalJIT or EnableJVMCIProduct must be specified to the java launcher. Each of these flags set EnableJVMCI to true as a side-effect. That is, use of JVMCI as JIT is already opt-in due to needing these other flags - specifying EnableJVMCI is redundant. 2. JVMCI as guest code compiler In this mode, the jdk.internal.vm.ci module must be loaded (i.e. EnableJVMCI currently has the side-effect of `--add-modules=jdk.internal.vm.ci`). This module has no unqualified exports (as seen in its module descriptor) so using it requires specifying at least one instance of --add-exports to the Java launcher. That is, once again EnableJVMCI alone is not sufficient for opting-in to JVMCI. In light of the above, I propose removing EnableJVMCI altogether. This will require using --add-modules=jdk.internal.vm.ci when you actually want to use the JVMCI module. It will also require modifying JDK code guarded by this flag. It guards both VM code and use of the `jdk.internal.vm.ci` module and I consider them separately below. #### VM code All uses of EnableJVMCI to guard VM code would adapted with one of the following strategies: 1. Remove the guard and make the code unconditional. 2. Replace EnableJVMCI with something else such as UseJVMCICompiler or test of a global variable set to true as soon as JVMCI compiled code is about to be installed in the code cache (example). 3. Replace EnableJVMCI with a test of whether the jdk.internal.vm.ci module has been resolved (example). Of course, this change almost certainly needs a CSR as well but I?d like to get feedback on the primary change before worrying about that. -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From alanb at openjdk.org Sat Feb 1 12:05:52 2025 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 1 Feb 2025 12:05:52 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native In-Reply-To: References: <8t8V6-A3nLdnBCS_yjfVqurgQDNCrYkACZPN3V3lYvs=.3c35a2e0-917f-4801-ad1d-d1eb428adaff@github.com> Message-ID: <2SFAoEqpXa6eH5YyN07oH_tcppE3yMUDpoN3xMxheWU=.1ba07c63-f68d-434d-822f-67a3b15d0293@github.com> On Fri, 31 Jan 2025 17:22:37 GMT, Aleksey Shipilev wrote: > I am thinking if anything new happens if we can reflect the field, `setAccessible(true)` it, and overwrite it. I guess the normal protection rules disallow the `setAccessible` part, but it does not hurt to think and confirm this is still enough and good. The field won't be accessible by default so I think you are pondering the case where someone opens java.lang for deep reflection and hack on this field. At some point the ongoing work on integrity will get to "final means final" so code can't modify a final instance field (this restriction already exists for records and hidden classes). In the mean-time, no objection to extending the current reflection filter to hide this field although that filtering mechanism is a ah hoc and needs to go away in the long term. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23396#issuecomment-2628925203 From zgu at openjdk.org Sat Feb 1 16:57:26 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Sat, 1 Feb 2025 16:57:26 GMT Subject: RFR: 8349083: Factor out filename handling code from logging Message-ID: Factor out filename substitution code from unified logging, so that it can be used elsewhere: 1. Make filename substitution consistent. Support following substitutions cross JVM ``` %p -> pid %t -> timestamp %hn -> hostname 2. Reduce redundant code ------------- Commit messages: - Cleanup - More cleanup - rename - Clean up - Merge branch 'master' into JDK-8349083 - Fix bug - Fix include - v2 - v1 - add new files - ... and 1 more: https://git.openjdk.org/jdk/compare/06ebb170...0dfc026c Changes: https://git.openjdk.org/jdk/pull/23410/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23410&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349083 Stats: 322 lines in 7 files changed: 201 ins; 114 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23410/head:pull/23410 PR: https://git.openjdk.org/jdk/pull/23410 From mdoerr at openjdk.org Sun Feb 2 18:04:53 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Sun, 2 Feb 2025 18:04:53 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v17] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: <1ZUwg_XG_-UflAveSdjzJ51Ld05SJabeltMa0RXEjmk=.1a5f5b85-dcd2-4f5b-b3f3-5057b53efdd0@github.com> On Tue, 28 Jan 2025 16:23:48 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: > > - restore chnges > - restore chnges You will have to merge and make adaptations for https://github.com/openjdk/jdk/commit/a414a591dd8d66f1500cd69dd65baa6ba4224c2a. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20235#issuecomment-2629494285 From roberto.castaneda.lozano at oracle.com Mon Feb 3 10:06:30 2025 From: roberto.castaneda.lozano at oracle.com (Roberto Castaneda Lozano) Date: Mon, 3 Feb 2025 10:06:30 +0000 Subject: The Cost of Profiling in the HotSpot Virtual Machine In-Reply-To: <6a4681bb-4ea6-4ca6-9dcf-0edb11827664@littlepinkcloud.com> References: <6a4681bb-4ea6-4ca6-9dcf-0edb11827664@littlepinkcloud.com> Message-ID: Hi Andrew, you can find the entire proceedings of MPLR'24 (including that paper) here: https://ckirsch.github.io/publications/proceedings/MPLR24.pdf#page=117 Cheers, Roberto ________________________________________ From: hotspot-dev on behalf of Andrew Haley Sent: Sunday, December 29, 2024 1:15 PM To: hotspot-dev at openjdk.java.net Subject: The Cost of Profiling in the HotSpot Virtual Machine Has anyone here seen an open-access copy of this? It seems to be the only paper in the proceedings that is not open. Usually one of the authors of a paper will publish a copy on their own web page, but not this time. https://dl.acm.org/doi/10.1145/3679007.3685055 Thanks, -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph-open at littlepinkcloud.com Mon Feb 3 11:18:56 2025 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Mon, 3 Feb 2025 11:18:56 +0000 Subject: RFD: The Cost of Profiling in the HotSpot Virtual Machine Message-ID: <06c95e20-4d11-4598-910d-ef75b4d06d22@littlepinkcloud.com> This paper is available at https://ckirsch.github.io/publications/proceedings/MPLR24.pdf#page=117 One thing that really stands out is the slowdown caused by multiple threads racing to increment profile counters. While this may seem like a theoretical concern, we have seen it in customers' real-world situations. When an application spins up worker threads which all start at the same time, the resulting memory traffic can substantially delay application startup. It would not be very difficult to fix problem this by using a very simple implementation of distributed counters, but doing so would generate (even) more code and would be slower in the single-threaded case. I have created https://bugs.openjdk.org/browse/JDK-8348027 to track this possibility. But is it worth doing anything about this at all? You could argue that any application that starts too many threads to soon is simply misconfigured, but it's hard for Java users to diagnose what's happening. Maybe Project Leyden will solve the problem in a better way by removing the emphasis on warmup. What do you think? Thanks, -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From cnorrbin at openjdk.org Mon Feb 3 11:26:19 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Mon, 3 Feb 2025 11:26:19 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree Message-ID: Hi everyone, The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. Two key changes enable this feature: 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. A simple example of you could use an intrusive tree is found below: ```c++ struct MyIntrusiveStructure { Node node; // The tree node is part of an external structure int data; MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} Node* get_node() { return &node; } static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } }; Tree my_intrusive_tree; Cursor insert_cursor = my_intrusive_tree.cursor_find(0); Node insert_node = Node(0); // Custom allocation here is just malloc MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); new (place) MyIntrusiveStructure(0, insert_node); my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); Cursor find_cursor = my_intrusive_tree.cursor_find(0); int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; Please let me know any feedback or concerns! ------------- Commit messages: - intrusive red-black tree Changes: https://git.openjdk.org/jdk/pull/23416/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349211 Stats: 669 lines in 3 files changed: 463 ins; 135 del; 71 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From dholmes at openjdk.org Mon Feb 3 12:01:12 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 3 Feb 2025 12:01:12 GMT Subject: RFR: 8349083: Factor out filename handling code from logging In-Reply-To: References: Message-ID: On Sat, 1 Feb 2025 16:53:13 GMT, Zhengyu Gu wrote: > Factor out filename substitution code from unified logging, so that it can be used elsewhere: > > 1. Make filename substitution consistent. Support following substitutions cross JVM > ``` > %p -> pid > %t -> timestamp > %hn -> hostname > > > 2. Reduce redundant code src/hotspot/share/utilities/filenameUtil.hpp line 41: > 39: // Expand wildcards in filename: > 40: // %p -> PID > 41: // %t -> timestamp in YY-MM-DD_HH_MM_SS format Not just a "timestamp" though it is specifically the VM start time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23410#discussion_r1939258968 From stefank at openjdk.org Mon Feb 3 12:14:35 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 3 Feb 2025 12:14:35 GMT Subject: RFR: 8323158: HotSpot Style Guide should specify more include ordering [v2] In-Reply-To: References: Message-ID: > The HotSpot Style Guide has a section about source files and includes. The style used for includes have mostly been introduced by scripts when includeDB was replaced, but also when various other enhancements to our includes were made. Some of the introduced styles were never written down in the style guide. > > I propose a couple of changes to the HotSpot Style Guide to reflect some of these implicit styles that we have. While updating the text I also took the liberty to order the items in an order that I felt was good. > > Note that JDK-8323158 contains a few more suggestions, but I've only addressed the items that I think can be accepted without much contention. Either I extract the items that have not been address into a new RFE, or I create a new RFE for this PR. > > There a some small whitespace tweaks that I made so that the .md and .html had a similar layout. Stefan Karlsson has updated the pull request incrementally with two additional commits since the last revision: - Update hotspot-style.md - Update hotspot-style.html ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23388/files - new: https://git.openjdk.org/jdk/pull/23388/files/51913afa..f01c564c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23388&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23388&range=00-01 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23388.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23388/head:pull/23388 PR: https://git.openjdk.org/jdk/pull/23388 From stefank at openjdk.org Mon Feb 3 12:14:36 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 3 Feb 2025 12:14:36 GMT Subject: RFR: 8323158: HotSpot Style Guide should specify more include ordering In-Reply-To: <9RwWVrZpTqsRO5srrrT0jOt4CGc7oF5FEm06Pzjf2yI=.a5fc3070-226c-4292-9802-426c9cab1672@github.com> References: <9RwWVrZpTqsRO5srrrT0jOt4CGc7oF5FEm06Pzjf2yI=.a5fc3070-226c-4292-9802-426c9cab1672@github.com> Message-ID: On Sat, 1 Feb 2025 07:19:02 GMT, Doug Simon wrote: > A lot of these rules looks like they could be checked with some simple scripting or additions to jcheck. Have you considered that? I haven't felt the urge to write such a script, but I know that others have scripts to sort the includes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23388#issuecomment-2630774637 From dnsimon at openjdk.org Mon Feb 3 12:17:50 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 3 Feb 2025 12:17:50 GMT Subject: RFR: 8323158: HotSpot Style Guide should specify more include ordering In-Reply-To: References: <9RwWVrZpTqsRO5srrrT0jOt4CGc7oF5FEm06Pzjf2yI=.a5fc3070-226c-4292-9802-426c9cab1672@github.com> Message-ID: <1gOEDAkU1eGcUnYqPawR3g5OqAseBiVIVPUIi4O9GYc=.321e1b86-3c6f-4c92-8036-27240a778659@github.com> On Mon, 3 Feb 2025 12:11:35 GMT, Stefan Karlsson wrote: > I haven't felt the urge to write such a script, but I know that others have scripts to sort the includes Ok, it was just a suggestion. My experience is that while clearly written conventions/rules are important, the more they can be automated, the less hassle for everyone. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23388#issuecomment-2630781761 From zgu at openjdk.org Mon Feb 3 14:02:47 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 3 Feb 2025 14:02:47 GMT Subject: RFR: 8349083: Factor out filename handling code from logging In-Reply-To: References: Message-ID: <0A0RqOvBYkQ-6PV1nKsRSrVfjjdewmy69KWoUlEC29U=.28cd12b0-cb71-48d5-91d8-0c94ce7e3f53@github.com> On Mon, 3 Feb 2025 11:58:30 GMT, David Holmes wrote: >> Factor out filename substitution code from unified logging, so that it can be used elsewhere: >> >> 1. Make filename substitution consistent. Support following substitutions cross JVM >> ``` >> %p -> pid >> %t -> timestamp >> %hn -> hostname >> >> >> 2. Reduce redundant code > > src/hotspot/share/utilities/filenameUtil.hpp line 41: > >> 39: // Expand wildcards in filename: >> 40: // %p -> PID >> 41: // %t -> timestamp in YY-MM-DD_HH_MM_SS format > > Not just a "timestamp" though it is specifically the VM start time. The "timestamp" comes from `os::javaTimeMillis()`, it is not the VM start time. Unified logging captures the VM start time and uses it from its filename. output = new LogFileOutput(name, _vm_start_time); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23410#discussion_r1939423915 From galder at openjdk.org Mon Feb 3 14:22:52 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 3 Feb 2025 14:22:52 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> Message-ID: On Fri, 17 Jan 2025 17:53:24 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo @eastig fyi ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2631136070 From coleenp at openjdk.org Mon Feb 3 14:31:28 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Feb 2025 14:31:28 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v3] In-Reply-To: References: Message-ID: > This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. > Tested with tier1-4. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Hide Class.protectionDomain for reflection and add a test case. - Merge branch 'master' into protection-domain - Fix two tests. - Fix the test. - 8349145: Make Class.getProtectionDomain() non-native ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23396/files - new: https://git.openjdk.org/jdk/pull/23396/files/d7aafbaf..954c9a76 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23396&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23396&range=01-02 Stats: 3875 lines in 26 files changed: 176 ins; 3644 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/23396.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23396/head:pull/23396 PR: https://git.openjdk.org/jdk/pull/23396 From liach at openjdk.org Mon Feb 3 14:55:49 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 3 Feb 2025 14:55:49 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v3] In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 14:31:28 GMT, Coleen Phillimore wrote: >> This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Hide Class.protectionDomain for reflection and add a test case. > - Merge branch 'master' into protection-domain > - Fix two tests. > - Fix the test. > - 8349145: Make Class.getProtectionDomain() non-native src/java.base/share/classes/jdk/internal/reflect/Reflection.java line 59: > 57: Reflection.class, ALL_MEMBERS, > 58: AccessibleObject.class, ALL_MEMBERS, > 59: Class.class, Set.of("classLoader", "classData", "protectionDomain"), Can you run a hello world with `-Xlog:class+init` to see if Reflection is initialized after `System$2` or something that implements JavaLangAccess? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1939515576 From duke at openjdk.org Mon Feb 3 15:49:01 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 3 Feb 2025 15:49:01 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v3] In-Reply-To: References: Message-ID: > By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - merging master - Use SHA3Parallel for matrix generation - fixing whitespace errors - 8348561: Add aarch64 intrinsics for ML-DSA ------------- Changes: https://git.openjdk.org/jdk/pull/23300/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23300&range=02 Stats: 2133 lines in 19 files changed: 2045 ins; 11 del; 77 mod Patch: https://git.openjdk.org/jdk/pull/23300.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23300/head:pull/23300 PR: https://git.openjdk.org/jdk/pull/23300 From coleenp at openjdk.org Mon Feb 3 16:02:00 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Feb 2025 16:02:00 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v3] In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 14:31:28 GMT, Coleen Phillimore wrote: >> This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Hide Class.protectionDomain for reflection and add a test case. > - Merge branch 'master' into protection-domain > - Fix two tests. > - Fix the test. > - 8349145: Make Class.getProtectionDomain() non-native [0.102s][info][class,init] 55 Initializing 'java/lang/System$1'(no method) (0x0000000042057eb8) by thread "main" ... [0.125s][info][class,init] 174 Initializing 'jdk/internal/reflect/Reflection' (0x0000000042066f50) by thread "main" I don't have System$2 just System$1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23396#issuecomment-2631411677 From coleenp at openjdk.org Mon Feb 3 16:11:06 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Feb 2025 16:11:06 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v4] In-Reply-To: References: Message-ID: > This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. > Tested with tier1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix test that knows which fields are hidden from reflection in jvmci. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23396/files - new: https://git.openjdk.org/jdk/pull/23396/files/954c9a76..d04b808f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23396&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23396&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23396.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23396/head:pull/23396 PR: https://git.openjdk.org/jdk/pull/23396 From liach at openjdk.org Mon Feb 3 16:11:07 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 3 Feb 2025 16:11:07 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v3] In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 14:31:28 GMT, Coleen Phillimore wrote: >> This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Hide Class.protectionDomain for reflection and add a test case. > - Merge branch 'master' into protection-domain > - Fix two tests. > - Fix the test. > - 8349145: Make Class.getProtectionDomain() non-native The subsequent changes look good. Something might have changed in the recent bootstrap sequence, but yes it's great to see that Reflection now loads after JLA and can actually use String hash codes. ------------- Marked as reviewed by liach (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23396#pullrequestreview-2590379807 PR Comment: https://git.openjdk.org/jdk/pull/23396#issuecomment-2631425455 From duke at openjdk.org Mon Feb 3 16:15:32 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 3 Feb 2025 16:15:32 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v4] In-Reply-To: References: Message-ID: > By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: removed debugging code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23300/files - new: https://git.openjdk.org/jdk/pull/23300/files/5630fd14..9f7c4a23 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23300&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23300&range=02-03 Stats: 25 lines in 3 files changed: 0 ins; 25 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23300.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23300/head:pull/23300 PR: https://git.openjdk.org/jdk/pull/23300 From vpaprotski at openjdk.org Mon Feb 3 16:44:10 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 3 Feb 2025 16:44:10 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v4] In-Reply-To: References: Message-ID: <9DMCff2xI947yLBGtcDldqt-r16p11nCIJoZtxJjHvo=.ffe41f5e-2613-4006-8e48-b25fef3a2c02@github.com> > (Also see `8319429: Resetting MXCSR flags degrades ecore`) > > For performance, signaling flags (bottom 6 bits) are set by default in MXCSR. This PR fixes the Xcheck:jni comparison that is producing these copious warnings: > > OpenJDK 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall > > > **This in fact happens on both Windows _AND_ Linux.** However, _only_ on Windows there is a crash. This PR fixes the crash but I have not been able to track down the source of the crash (i.e. crash in the warn handler). Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: fix crash in fxrstor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22673/files - new: https://git.openjdk.org/jdk/pull/22673/files/2b15f99a..b1a712bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22673&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22673&range=02-03 Stats: 10 lines in 1 file changed: 9 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22673.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22673/head:pull/22673 PR: https://git.openjdk.org/jdk/pull/22673 From vladimir.kozlov at oracle.com Mon Feb 3 17:26:39 2025 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 3 Feb 2025 09:26:39 -0800 Subject: RFD: The Cost of Profiling in the HotSpot Virtual Machine In-Reply-To: <06c95e20-4d11-4598-910d-ef75b4d06d22@littlepinkcloud.com> References: <06c95e20-4d11-4598-910d-ef75b4d06d22@littlepinkcloud.com> Message-ID: <837ed36c-463b-4599-9d8a-f40f5fb0d011@oracle.com> It is known issue [1]. We could use randomized profiling counters update as suggested in RFE. I think we had such implementation in our JAOTC work back in days. But it may not such simple on not x86 platforms. And, as Andrew said, it will complicate code and slowdown one thread performance during startup. Yes, Leyden can help with this. Especially with next JEP we are working on [2] "Ahead-of-Time Method Profiling". It requires "training" run to collect profiling data and may not work for all cases. But I think we should go this way since we are already working on it. Thanks, Vladimir K [1] https://bugs.openjdk.org/browse/JDK-8134940 [2] https://bugs.openjdk.org/browse/JDK-8325147 On 2/3/25 3:18 AM, Andrew Haley wrote: > This paper is available at > https://ckirsch.github.io/publications/proceedings/MPLR24.pdf#page=117 > > One thing that really stands out is the slowdown caused by multiple > threads racing to increment profile counters. While this may seem like > a theoretical concern, we have seen it in customers' real-world > situations. When an application spins up worker threads which all > start at the same time, the resulting memory traffic can substantially > delay application startup. > > It would not be very difficult to fix problem this by using a very > simple implementation of distributed counters, but doing so would > generate (even) more code and would be slower in the single-threaded > case. I have created https://bugs.openjdk.org/browse/JDK-8348027 to > track this possibility. > > But is it worth doing anything about this at all? You could argue that > any application that starts too many threads to soon is simply > misconfigured, but it's hard for Java users to diagnose what's > happening. Maybe Project Leyden will solve the problem in a better > way by removing the emphasis on warmup. > > What do you think? > > Thanks, > From vladimir.kozlov at oracle.com Mon Feb 3 17:45:39 2025 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 3 Feb 2025 09:45:39 -0800 Subject: Proposal: Remove EnableJVMCI flag In-Reply-To: References: Message-ID: <611affa4-09c6-41af-a853-1106e12dfbb9@oracle.com> Hi Doug, My concern is that some code (stubs, blobs, Interpreter) are generated before we are loading any modules. How you handle JVMCI specific code there if you have it? If you don't have such code than we can discuss. I definitely against adding runtime checks for JVMCI presence into executed (assembler) code. Would be nice if/when command line is parsed we can detect presence of `--add-modules=jdk.internal.vm.ci` (or others related) flag and enable JVMCI flag. I am fine to keep `EnableJVMCI` but make it ergonomic. You may still want to disable JVMCI from command line even if somewhere in start script you have `--add-modules=jdk.internal.vm.ci`. Thanks, Vladimir K On 2/1/25 12:03 AM, Douglas Simon wrote: > Hi, > > https://bugs.openjdk.org/browse/JDK-8345826 ?was filed to make libgraal and > new CDS optimizations more compatible: > >> Since JDK 483, many more CDS optimizations are enabled when -XX:+AOTClassLinking is specified (see numbers in?https:// >> bugs.openjdk.org/browse/JDK-8342279). However, these optimizations require the archived module graph to be used. >> Today, if you enable UseGraalJIT, the archived module graph will be disabled. As a result, the *entire* CDS archive >> will be disabled. This will result in slower start-up time when UseGraalJIT is enabled. >> > > Further internal discussion focusedId=14736369&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14736369>?resulted in > the proposal to remove all use of EnableJVMCI in the VM code. This will mean -XX:+EnableJVMCI only applies to the Java > code (i.e. adds jdk.internal.vm.ci to the root module set). > > However, further reflection suggests something more aggressive is worth considering: remove the EnableJVMCI flag altogether. > > This option was implemented to make use of JVMCI opt-in. However, JVMCI is effectively opt-in anyway without this > option. There are two ways in which JVMCI can be used: as a JIT compiler by the CompileBroker and as a compiler for > ?guest? code (e.g., Truffle use case). > > 1. JVMCI as JIT. > > To enable JVMCI as JIT, flags such as UseJVMCICompiler, UseGraalJIT or EnableJVMCIProduct must be specified to the java > launcher. Each of these flags set EnableJVMCI to true as a side-effect. That is, use of JVMCI as JIT is already opt-in > due to needing these other flags - specifying EnableJVMCI is redundant. > > 2. JVMCI as guest code compiler > > In this mode, the jdk.internal.vm.ci module must be loaded (i.e. EnableJVMCI currently has the side-effect of `--add- > modules=jdk.internal.vm.ci`). This module has no unqualified exports (as seen in its module descriptor github.com/openjdk/jdk/blob/master/src/jdk.internal.vm.ci/share/classes/module-info.java>)?so using it requires > specifying at least one instance of --add-exports to the Java launcher. That is, once again EnableJVMCI alone is not > sufficient for opting-in to JVMCI. > > In light of the above, I propose removing EnableJVMCI altogether. This will require using --add- > modules=jdk.internal.vm.ci when you actually want to use the JVMCI module. It will also require modifying JDK code > guarded by this flag. It guards both VM code and use of the `jdk.internal.vm.ci` module and I consider them separately > below. > > #### VM code > > All uses of EnableJVMCI to guard VM code would adapted with one of the following strategies: > 1. Remove the guard and make the code unconditional. > 2. Replace EnableJVMCI with something else such as UseJVMCICompiler or test of a global variable set to true as soon as > JVMCI compiled code is about to be installed in the code cache (example files#diff-ee8337800ed1d1b84e3e49a2481809a6affac5d70ca23934a44497c9c758092fR456>). > 3. Replace EnableJVMCI with a test of whether the jdk.internal.vm.ci module has been resolved (example github.com/openjdk/jdk/pull/23408/files#diff-4e6668d768f7d67417cbac39bcb723552cc0b80ad218709cfa0e6e31f32b69f0R518>). > > Of course, this change almost certainly needs a CSR as well but I?d like to get feedback on the primary change before > worrying about that. > > -Doug > From coleenp at openjdk.org Mon Feb 3 17:49:19 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Feb 2025 17:49:19 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native Message-ID: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. Tested with tier1-8. ------------- Commit messages: - Removed @Stable. - Fix JFR bug. - 8345678: Make Class.getModifiers() non-native. Changes: https://git.openjdk.org/jdk/pull/22652/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22652&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346567 Stats: 218 lines in 34 files changed: 57 ins; 139 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/22652.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22652/head:pull/22652 PR: https://git.openjdk.org/jdk/pull/22652 From liach at openjdk.org Mon Feb 3 17:49:20 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 3 Feb 2025 17:49:20 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: <1kHVpYCOExfkn8UHTZNZT6zwjRj3MCXJD2LVcY0NTrg=.0644323b-5f40-4441-8c19-763105aaf08d@github.com> On Mon, 9 Dec 2024 19:26:53 GMT, Coleen Phillimore wrote: > The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. > > There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. > > Tested with tier1-8. The change to java.lang.Class looks good. Looking at #23396, we might need to filter this field too. src/hotspot/share/classfile/javaClasses.cpp line 1504: > 1502: macro(_reflectionData_offset, k, "reflectionData", java_lang_ref_SoftReference_signature, false); \ > 1503: macro(_signers_offset, k, "signers", object_array_signature, false); \ > 1504: macro(_modifiers_offset, k, vmSymbols::modifiers_name(), int_signature, false) Do we need a trailing semicolon here? src/java.base/share/classes/java/lang/Class.java line 1315: > 1313: > 1314: // Set by the JVM when creating the instance of this java.lang.Class > 1315: private transient int modifiers; If this is set by the JVM, can this be marked `final` so JIT compiler can trust this field? Also preferable if we can move this together with components/signers/classData fields. ------------- PR Review: https://git.openjdk.org/jdk/pull/22652#pullrequestreview-2490110846 PR Comment: https://git.openjdk.org/jdk/pull/22652#issuecomment-2631658029 PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1876630297 PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1876627105 From coleenp at openjdk.org Mon Feb 3 17:49:20 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Feb 2025 17:49:20 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Mon, 9 Dec 2024 19:26:53 GMT, Coleen Phillimore wrote: > The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. > > There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. > > Tested with tier1-8. > Looking at https://github.com/openjdk/jdk/pull/23396, we might need to filter this field too. Yes, I agree. This patch is a follow on to that one, so I'll add it to the same places when that one is merged in here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22652#issuecomment-2631661716 From coleenp at openjdk.org Mon Feb 3 17:49:20 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Feb 2025 17:49:20 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: <1kHVpYCOExfkn8UHTZNZT6zwjRj3MCXJD2LVcY0NTrg=.0644323b-5f40-4441-8c19-763105aaf08d@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> <1kHVpYCOExfkn8UHTZNZT6zwjRj3MCXJD2LVcY0NTrg=.0644323b-5f40-4441-8c19-763105aaf08d@github.com> Message-ID: On Mon, 9 Dec 2024 19:46:43 GMT, Chen Liang wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > src/hotspot/share/classfile/javaClasses.cpp line 1504: > >> 1502: macro(_reflectionData_offset, k, "reflectionData", java_lang_ref_SoftReference_signature, false); \ >> 1503: macro(_signers_offset, k, "signers", object_array_signature, false); \ >> 1504: macro(_modifiers_offset, k, vmSymbols::modifiers_name(), int_signature, false) > > Do we need a trailing semicolon here? yes. it is needed. > src/java.base/share/classes/java/lang/Class.java line 1315: > >> 1313: >> 1314: // Set by the JVM when creating the instance of this java.lang.Class >> 1315: private transient int modifiers; > > If this is set by the JVM, can this be marked `final` so JIT compiler can trust this field? Also preferable if we can move this together with components/signers/classData fields. The JVM rearranges these fields so that's why I put it near the caller. Let me check if final compiles. Edit: it looks better with the other fields though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1876712191 PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1876713323 From duke at openjdk.org Mon Feb 3 17:49:20 2025 From: duke at openjdk.org (ExE Boss) Date: Mon, 3 Feb 2025 17:49:20 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> <1kHVpYCOExfkn8UHTZNZT6zwjRj3MCXJD2LVcY0NTrg=.0644323b-5f40-4441-8c19-763105aaf08d@github.com> Message-ID: On Mon, 9 Dec 2024 20:27:52 GMT, Coleen Phillimore wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 1504: >> >>> 1502: macro(_reflectionData_offset, k, "reflectionData", java_lang_ref_SoftReference_signature, false); \ >>> 1503: macro(_signers_offset, k, "signers", object_array_signature, false); \ >>> 1504: macro(_modifiers_offset, k, vmSymbols::modifiers_name(), int_signature, false) >> >> Do we need a trailing semicolon here? > > yes. it is needed. This is?**C++**, so?yes. Suggestion: macro(_modifiers_offset, k, vmSymbols::modifiers_name(), int_signature, false); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1876794006 From coleenp at openjdk.org Mon Feb 3 17:49:20 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Feb 2025 17:49:20 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> <1kHVpYCOExfkn8UHTZNZT6zwjRj3MCXJD2LVcY0NTrg=.0644323b-5f40-4441-8c19-763105aaf08d@github.com> Message-ID: <44DPWzTGxPDoyWwZFbAxE74-KrXChIvfusVws1N-uN0=.f346731b-c61e-468f-9f58-4dc6e2df35d2@github.com> On Mon, 9 Dec 2024 21:35:42 GMT, ExE Boss wrote: >> yes. it is needed. > > This is?**C++**, so?yes. > Suggestion: > > macro(_modifiers_offset, k, vmSymbols::modifiers_name(), int_signature, false); I see, there's a trailing semi somewhere in the expansion of this macro so it compiles, but I added one in. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1878263513 From heidinga at openjdk.org Mon Feb 3 17:49:20 2025 From: heidinga at openjdk.org (Dan Heidinga) Date: Mon, 3 Feb 2025 17:49:20 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: <5bMxhTRPqj-dMhr3FoSrym2ttWuzjWwtXAEcQHbF9Vg=.859ae29f-2530-4130-b108-d47c100ac19f@github.com> On Mon, 9 Dec 2024 19:26:53 GMT, Coleen Phillimore wrote: > The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. > > There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. > > Tested with tier1-8. src/java.base/share/classes/java/lang/Class.java line 244: > 242: classLoader = loader; > 243: componentType = arrayComponentType; > 244: modifiers = 0; The comment above about assigning a parameter to the field to prevent the JIT from assuming an incorrect default also should apply to the new `modifiers` field. I think the constructor, which is never called, should also pass in a `dummyModifiers` value rather than using 0 directly ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1880689835 From coleenp at openjdk.org Mon Feb 3 17:49:20 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Feb 2025 17:49:20 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: <5bMxhTRPqj-dMhr3FoSrym2ttWuzjWwtXAEcQHbF9Vg=.859ae29f-2530-4130-b108-d47c100ac19f@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> <5bMxhTRPqj-dMhr3FoSrym2ttWuzjWwtXAEcQHbF9Vg=.859ae29f-2530-4130-b108-d47c100ac19f@github.com> Message-ID: On Wed, 11 Dec 2024 18:15:57 GMT, Dan Heidinga wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > src/java.base/share/classes/java/lang/Class.java line 244: > >> 242: classLoader = loader; >> 243: componentType = arrayComponentType; >> 244: modifiers = 0; > > The comment above about assigning a parameter to the field to prevent the JIT from assuming an incorrect default also should apply to the new `modifiers` field. I think the constructor, which is never called, should also pass in a `dummyModifiers` value rather than using 0 directly Yes, definitely, didn't see that this is the right way to do this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1887157349 From duke at openjdk.org Mon Feb 3 17:49:21 2025 From: duke at openjdk.org (ExE Boss) Date: Mon, 3 Feb 2025 17:49:21 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: <-GEiPPAhFzy-uaUwIACYA7fZVCT3wkuVd-gtf9rrlnw=.de130f97-59bd-4581-a568-05d6238cf90a@github.com> On Mon, 9 Dec 2024 19:26:53 GMT, Coleen Phillimore wrote: > The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. > > There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. > > Tested with tier1-8. src/java.base/share/classes/java/lang/Class.java line 1005: > 1003: private transient Object[] signers; // Read by VM, mutable > 1004: > 1005: @Stable The?`modifiers`?field doesn?t?need to?be?`@Stable`: Suggestion: test/micro/org/openjdk/bench/java/lang/reflect/Clazz.java line 65: > 63: */ > 64: @Benchmark > 65: public int getModifiers() throws NoSuchMethodException { The?only `Throwable`s that?can be?thrown by?calling `Class::getModifiers()` are?`Error`s (e.g.:?`StackOverflowError`) and?`RuntimeException`s (e.g.:?`NullPointerException`): Suggestion: public int getModifiers() { test/micro/org/openjdk/bench/java/lang/reflect/Clazz.java line 71: > 69: Clazz[] clazzArray = new Clazz[1]; > 70: @Benchmark > 71: public int getAppArrayModifiers() throws NoSuchMethodException { Suggestion: public int getAppArrayModifiers() { test/micro/org/openjdk/bench/java/lang/reflect/Clazz.java line 81: > 79: */ > 80: @Benchmark > 81: public int getArrayModifiers() throws NoSuchMethodException { Suggestion: public int getArrayModifiers() { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1888757754 PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1888760732 PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1888760967 PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1888761412 From coleenp at openjdk.org Mon Feb 3 17:49:21 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Feb 2025 17:49:21 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: <-GEiPPAhFzy-uaUwIACYA7fZVCT3wkuVd-gtf9rrlnw=.de130f97-59bd-4581-a568-05d6238cf90a@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> <-GEiPPAhFzy-uaUwIACYA7fZVCT3wkuVd-gtf9rrlnw=.de130f97-59bd-4581-a568-05d6238cf90a@github.com> Message-ID: On Tue, 17 Dec 2024 15:54:48 GMT, ExE Boss wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > src/java.base/share/classes/java/lang/Class.java line 1005: > >> 1003: private transient Object[] signers; // Read by VM, mutable >> 1004: >> 1005: @Stable > > The?`modifiers`?field doesn?t?need to?be?`@Stable`: > Suggestion: I now don't know whether we want @Stable here or not. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1890866329 From vklang at openjdk.org Mon Feb 3 17:49:21 2025 From: vklang at openjdk.org (Viktor Klang) Date: Mon, 3 Feb 2025 17:49:21 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Mon, 9 Dec 2024 19:26:53 GMT, Coleen Phillimore wrote: > The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. > > There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. > > Tested with tier1-8. src/java.base/share/classes/java/lang/Class.java line 1006: > 1004: private final transient int modifiers; // Set by the VM > 1005: > 1006: // package-private @coleenp Could this field be @Stable, or does that only apply to `putfield`s? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1879797327 From liach at openjdk.org Mon Feb 3 17:49:21 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 3 Feb 2025 17:49:21 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Wed, 11 Dec 2024 10:24:03 GMT, Viktor Klang wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > src/java.base/share/classes/java/lang/Class.java line 1006: > >> 1004: private final transient int modifiers; // Set by the VM >> 1005: >> 1006: // package-private > > @coleenp Could this field be @Stable, or does that only apply to `putfield`s? I don't think this needs to be stable - finals in java.lang is trusted by the JIT compiler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1880350790 From vklang at openjdk.org Mon Feb 3 17:49:24 2025 From: vklang at openjdk.org (Viktor Klang) Date: Mon, 3 Feb 2025 17:49:24 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Wed, 11 Dec 2024 14:52:48 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/Class.java line 1006: >> >>> 1004: private final transient int modifiers; // Set by the VM >>> 1005: >>> 1006: // package-private >> >> @coleenp Could this field be @Stable, or does that only apply to `putfield`s? > > I don't think this needs to be stable - finals in java.lang is trusted by the JIT compiler. Yeah, I was just thinking whether something set from inside the VM which is marked @Stable is constant-folded :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1880374750 From coleenp at openjdk.org Mon Feb 3 17:49:24 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Feb 2025 17:49:24 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Wed, 11 Dec 2024 15:06:54 GMT, Viktor Klang wrote: >> I don't think this needs to be stable - finals in java.lang is trusted by the JIT compiler. > > Yeah, I was just thinking whether something set from inside the VM which is marked @Stable is constant-folded :) I don't think @Stable would hurt but final should provide the same guarantee. It's set internally by the VM so there's no late setting. I don't know if this field implementation can constant fold in the case of Arrays which are (JVM_ACC_ABSTRACT | JVM_ACC_FINAL | JVM_ACC_PUBLIC). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1880663099 From heidinga at openjdk.org Mon Feb 3 17:49:25 2025 From: heidinga at openjdk.org (Dan Heidinga) Date: Mon, 3 Feb 2025 17:49:25 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Wed, 11 Dec 2024 15:06:54 GMT, Viktor Klang wrote: >> I don't think this needs to be stable - finals in java.lang is trusted by the JIT compiler. > > Yeah, I was just thinking whether something set from inside the VM which is marked @Stable is constant-folded :) @viktorklang-ora `@Stable` is not about how the field was set, but about the JIT observing a non-default value at compile time. If it observes a non-default value, it can treat it as a compile time constant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1880692608 From vklang at openjdk.org Mon Feb 3 17:49:25 2025 From: vklang at openjdk.org (Viktor Klang) Date: Mon, 3 Feb 2025 17:49:25 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: <8Wx3xbbOnPXS5n1RuNaesqHbhKV3iLwrCVF0s6uWOrA=.cb20728e-e13c-4667-822b-3ba424cbc12f@github.com> On Wed, 11 Dec 2024 18:17:43 GMT, Dan Heidinga wrote: >> Yeah, I was just thinking whether something set from inside the VM which is marked @Stable is constant-folded :) > > @viktorklang-ora `@Stable` is not about how the field was set, but about the JIT observing a non-default value at compile time. If it observes a non-default value, it can treat it as a compile time constant. @DanHeidinga Great explanation, thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1881782322 From duke at openjdk.org Mon Feb 3 18:14:54 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 3 Feb 2025 18:14:54 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v2] In-Reply-To: <7UgNYEuTu6rj7queOgM9xIy-6kQMdACrZiDLtlniMYw=.dff6f18b-1236-43b1-8280-2bce9160f32a@github.com> References: <7UgNYEuTu6rj7queOgM9xIy-6kQMdACrZiDLtlniMYw=.dff6f18b-1236-43b1-8280-2bce9160f32a@github.com> Message-ID: On Thu, 30 Jan 2025 16:23:56 GMT, Andrew Dinn wrote: > @ferakocz I'm afraid you lucked out on getting your change committed before my reorganization of the stub generation code. If you are unsure of how to do the merge so your new stub is declared and generated following the new model (see the doc comments in stubDeclarations.hpp for details) let me know and I'll be happy to help you sort it out. @adinn I think I managed to figure it out. Please take a look at the PR and let me know if I should have done anything differently. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2631720583 From jbhateja at openjdk.org Mon Feb 3 18:14:56 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 3 Feb 2025 18:14:56 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v16] In-Reply-To: References: Message-ID: On Thu, 30 Jan 2025 11:03:43 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 9. X86 backend implementation for all supported intrinsics. >> 10. Functional and Performance validation tests. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java > > Co-authored-by: Emanuel Peter Hi @PaulSandoz , @eme64 , All outstanding comments haven been addressed, please let me know if there are other comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22754#issuecomment-2631719276 From doug.simon at oracle.com Mon Feb 3 19:09:26 2025 From: doug.simon at oracle.com (Douglas Simon) Date: Mon, 3 Feb 2025 19:09:26 +0000 Subject: Proposal: Remove EnableJVMCI flag In-Reply-To: <611affa4-09c6-41af-a853-1106e12dfbb9@oracle.com> References: <611affa4-09c6-41af-a853-1106e12dfbb9@oracle.com> Message-ID: Hi Vladimir, On 3 Feb 2025, at 18:45, Vladimir Kozlov wrote: Hi Doug, My concern is that some code (stubs, blobs, Interpreter) are generated before we are loading any modules. How you handle JVMCI specific code there if you have it? If you don't have such code than we can discuss. You mean what would we do with generated code that currently tests EnableJVMCI? We have these 2 options as far as I can see: 1. Always generate the JVMCI part of the code (example). 2. Instead of testing EnableJVMCI, we instead test a JVMCI::_is_enabled bool which would be initialized during argument parsing (i.e. before any code is generated). JVMCI::_is_enabled would be set to true if jdk.internal.vm.ci is in the root module set or if any other JVMCI flags such as UseGraalJIT or UseJVMCICompiler are true. I suspect this option is the one to go with as it?s pretty much equivalent to the current semantics (i.e. JVMCI conditional VM is only executed/generated) if JVMCI is enabled. I definitely against adding runtime checks for JVMCI presence into executed (assembler) code. I agree that we do not want that. Would be nice if/when command line is parsed we can detect presence of `--add-modules=jdk.internal.vm.ci` (or others related) flag and enable JVMCI flag. I am fine to keep `EnableJVMCI` but make it ergonomic. I?d like EnableJVMCI to become purely an alias for --add-modules=jdk.internal.vm.ci. You may still want to disable JVMCI from command line even if somewhere in start script you have `--add-modules=jdk.internal.vm.ci`. I don't think we need to support such a contradiction - if the launcher has been asked to load jdk.internal.vm.ci as part of the root module set, then it wants JVMCI enabled. Either that or we make -EnableJVMCI undo any preceding --add-modules=jdk.internal.vm.ci (if that?s even possible). -Doug On 2/1/25 12:03 AM, Douglas Simon wrote: Hi, https://bugs.openjdk.org/browse/JDK-8345826 was filed to make libgraal and new CDS optimizations more compatible: Since JDK 483, many more CDS optimizations are enabled when -XX:+AOTClassLinking is specified (see numbers in https:// bugs.openjdk.org/browse/JDK-8342279). However, these optimizations require the archived module graph to be used. Today, if you enable UseGraalJIT, the archived module graph will be disabled. As a result, the *entire* CDS archive will be disabled. This will result in slower start-up time when UseGraalJIT is enabled. Further internal discussion resulted in the proposal to remove all use of EnableJVMCI in the VM code. This will mean -XX:+EnableJVMCI only applies to the Java code (i.e. adds jdk.internal.vm.ci to the root module set). However, further reflection suggests something more aggressive is worth considering: remove the EnableJVMCI flag altogether. This option was implemented to make use of JVMCI opt-in. However, JVMCI is effectively opt-in anyway without this option. There are two ways in which JVMCI can be used: as a JIT compiler by the CompileBroker and as a compiler for ?guest? code (e.g., Truffle use case). 1. JVMCI as JIT. To enable JVMCI as JIT, flags such as UseJVMCICompiler, UseGraalJIT or EnableJVMCIProduct must be specified to the java launcher. Each of these flags set EnableJVMCI to true as a side-effect. That is, use of JVMCI as JIT is already opt-in due to needing these other flags - specifying EnableJVMCI is redundant. 2. JVMCI as guest code compiler In this mode, the jdk.internal.vm.ci module must be loaded (i.e. EnableJVMCI currently has the side-effect of `--add- modules=jdk.internal.vm.ci`). This module has no unqualified exports (as seen in its module descriptor ) so using it requires specifying at least one instance of --add-exports to the Java launcher. That is, once again EnableJVMCI alone is not sufficient for opting-in to JVMCI. In light of the above, I propose removing EnableJVMCI altogether. This will require using --add- modules=jdk.internal.vm.ci when you actually want to use the JVMCI module. It will also require modifying JDK code guarded by this flag. It guards both VM code and use of the `jdk.internal.vm.ci` module and I consider them separately below. #### VM code All uses of EnableJVMCI to guard VM code would adapted with one of the following strategies: 1. Remove the guard and make the code unconditional. 2. Replace EnableJVMCI with something else such as UseJVMCICompiler or test of a global variable set to true as soon as JVMCI compiled code is about to be installed in the code cache (example ). 3. Replace EnableJVMCI with a test of whether the jdk.internal.vm.ci module has been resolved (example ). Of course, this change almost certainly needs a CSR as well but I?d like to get feedback on the primary change before worrying about that. -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Mon Feb 3 19:14:08 2025 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 3 Feb 2025 11:14:08 -0800 Subject: Proposal: Remove EnableJVMCI flag In-Reply-To: References: <611affa4-09c6-41af-a853-1106e12dfbb9@oracle.com> Message-ID: <888c013b-482e-4269-972e-078b8517485e@oracle.com> On 2/3/25 11:09 AM, Douglas Simon wrote: > Hi Vladimir, > >> On 3 Feb 2025, at 18:45, Vladimir Kozlov wrote: >> >> Hi Doug, >> >> My concern is that some code (stubs, blobs, Interpreter) are generated before we are loading any modules. >> How you handle JVMCI specific code there if you have it? If you don't have such code than we can discuss. > > You mean what would we do with generated code that currently tests EnableJVMCI? We have these 2 options as far as I can see: > 1. Always generate the JVMCI part of the code (example files#diff-524c9e019cb83916aa3db772fb33acbbe3e7465867a8d2f7e6376be3c8260eddL606>). > 2. Instead of testing EnableJVMCI, we instead test a JVMCI::_is_enabled bool which would be initialized during argument > parsing (i.e. before any code is generated). JVMCI::_is_enabled would be set to true if jdk.internal.vm.ci is in the > root module set or if any other JVMCI flags such as UseGraalJIT or UseJVMCICompiler are true. I suspect this option is > the one to go with as it?s pretty much equivalent to the current semantics (i.e. JVMCI conditional VM is only executed/ > generated) if JVMCI is enabled. I agree with option 2. This looks like most reasonable approach. Thanks, Vladimir K > >> I definitely against adding runtime checks for JVMCI presence into executed (assembler) code. > > I agree that we do not want that. > >> Would be nice if/when command line is parsed we can detect presence of `--add-modules=jdk.internal.vm.ci` (or others >> related) flag and enable JVMCI flag. I am fine to keep `EnableJVMCI` but make it ergonomic. > > I?d like EnableJVMCI to become purely an alias for --add-modules=jdk.internal.vm.ci. > >> You may still want to disable JVMCI from command line even if somewhere in start script you have `--add- >> modules=jdk.internal.vm.ci`. > > I don't think we need to support such a contradiction - if the launcher has been asked to load jdk.internal.vm.ci as > part of the root module set, then it wants JVMCI enabled. Either that or we make -EnableJVMCI undo any preceding --add- > modules=jdk.internal.vm.ci (if that?s even possible). > > -Doug > >> On 2/1/25 12:03 AM, Douglas Simon wrote: >>> Hi, >>> https://bugs.openjdk.org/browse/JDK-8345826 ?was filed to make libgraal >>> and new CDS optimizations more compatible: >>>> Since JDK 483, many more CDS optimizations are enabled when -XX:+AOTClassLinking is specified (see numbers >>>> in?https:// bugs.openjdk.org/browse/JDK-8342279). However, these optimizations require the archived module graph to >>>> be used. Today, if you enable UseGraalJIT, the archived module graph will be disabled. As a result, the *entire* CDS >>>> archive will be disabled. This will result in slower start-up time when UseGraalJIT is enabled. >>>> >>> Further internal discussion >> focusedId=14736369&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14736369>?resulted >>> in the proposal to remove all use of EnableJVMCI in the VM code. This will mean -XX:+EnableJVMCI only applies to the >>> Java code (i.e. adds jdk.internal.vm.ci to the root module set). >>> However, further reflection suggests something more aggressive is worth considering: remove the EnableJVMCI flag >>> altogether. >>> This option was implemented to make use of JVMCI opt-in. However, JVMCI is effectively opt-in anyway without this >>> option. There are two ways in which JVMCI can be used: as a JIT compiler by the CompileBroker and as a compiler for >>> ?guest? code (e.g., Truffle use case). >>> 1. JVMCI as JIT. >>> To enable JVMCI as JIT, flags such as UseJVMCICompiler, UseGraalJIT or EnableJVMCIProduct must be specified to the >>> java launcher. Each of these flags set EnableJVMCI to true as a side-effect. That is, use of JVMCI as JIT is already >>> opt-in due to needing these other flags - specifying EnableJVMCI is redundant. >>> 2. JVMCI as guest code compiler >>> In this mode, the jdk.internal.vm.ci module must be loaded (i.e. EnableJVMCI currently has the side-effect of `--add- >>> modules=jdk.internal.vm.ci`). This module has no unqualified exports (as seen in its module descriptor >> github.com/openjdk/jdk/blob/master/src/jdk.internal.vm.ci/share/classes/module-info.java>)?so using it requires >>> specifying at least one instance of --add-exports to the Java launcher. That is, once again EnableJVMCI alone is not >>> sufficient for opting-in to JVMCI. >>> In light of the above, I propose removing EnableJVMCI altogether. This will require using --add- >>> modules=jdk.internal.vm.ci when you actually want to use the JVMCI module. It will also require modifying JDK code >>> guarded by this flag. It guards both VM code and use of the `jdk.internal.vm.ci` module and I consider them >>> separately below. >>> #### VM code >>> All uses of EnableJVMCI to guard VM code would adapted with one of the following strategies: >>> 1. Remove the guard and make the code unconditional. >>> 2. Replace EnableJVMCI with something else such as UseJVMCICompiler or test of a global variable set to true as soon >>> as JVMCI compiled code is about to be installed in the code cache (example >> pull/23408/ files#diff-ee8337800ed1d1b84e3e49a2481809a6affac5d70ca23934a44497c9c758092fR456>). >>> 3. Replace EnableJVMCI with a test of whether the jdk.internal.vm.ci module has been resolved (example >> github.com/openjdk/jdk/pull/23408/files#diff-4e6668d768f7d67417cbac39bcb723552cc0b80ad218709cfa0e6e31f32b69f0R518>). >>> Of course, this change almost certainly needs a CSR as well but I?d like to get feedback on the primary change before >>> worrying about that. >>> -Doug >> > From jsjolen at openjdk.org Mon Feb 3 20:54:17 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 3 Feb 2025 20:54:17 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 11:20:49 GMT, Casper Norrbin wrote: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Thanks, A few comments. Still looking at this. src/hotspot/share/utilities/rbTree.hpp line 33: > 31: #include > 32: > 33: struct Empty {}; This will export the name `Empty` to everyone, is it possible to move it to inside of the class? src/hotspot/share/utilities/rbTree.hpp line 71: > 69: const K& key() const { return _key; } > 70: V& val() { return _value; } > 71: V& val() const { return _value; } Hmm, this doesn't seem quite right. Why can't we have a `const` method returning a `const` value anymore? test/hotspot/gtest/utilities/test_rbtree.cpp line 751: > 749: } > 750: { // Make a very large tree and verify at the end > 751: struct Nothing {}; Now can use `Empty` instead. ------------- PR Review: https://git.openjdk.org/jdk/pull/23416#pullrequestreview-2590838361 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1939927105 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1939928307 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1940006437 From dholmes at openjdk.org Mon Feb 3 21:06:15 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 3 Feb 2025 21:06:15 GMT Subject: RFR: 8334320: Replace vmTestbase/metaspace/share/TriggerUnloadingWithWhiteBox.java with ClassUnloadCommon from testlibrary In-Reply-To: References: Message-ID: On Fri, 17 Jan 2025 13:43:35 GMT, Coleen Phillimore wrote: > Rename and make TriggerUnloadingWithWhiteBox call ClassUnloadCommon.triggerUnloading() instead of WB.fullGC(). Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23174#pullrequestreview-2590998615 From vpaprotski at openjdk.org Mon Feb 3 21:43:56 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 3 Feb 2025 21:43:56 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v5] In-Reply-To: References: Message-ID: > (Also see `8319429: Resetting MXCSR flags degrades ecore`) > > This PR fixes two issues: > - the original issue is a crash caused by `__ warn` corrupting the stack on Windows only > - This issue also uncovered that -Xcheck:jni test cases were getting 65k lines of warning on HelloWorld (on both Linux _and_ windows): > > OpenJDK 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall > > > First, the crash. Caused when FXRSTOR is attempting to write reserved bits into MXCSR. If those bits happen to be set, crash. (Hence the crash isn't deterministic. But frequent enough if `__ warn` is used). It is caused by the binding not reserving stack space for register parameters () > ![image](https://github.com/user-attachments/assets/4ad63908-088b-4e9d-9e7d-a3509bee046a) > Prolog of the warn function then proceeds to store the for arg registers onto the stack, overriding the fxstore save area. (See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#calling-convention-defaults) > > Fix uses `frame::arg_reg_save_area_bytes` to bump the stack pointer. > > --- > > I also kept the fix to `verify_mxcsr` since without it, `-Xcheck:jni` is practically unusable when `-XX:+EnableX86ECoreOpts` are set (65k+ lines of warnings) Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22673/files - new: https://git.openjdk.org/jdk/pull/22673/files/b1a712bf..2e372f29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22673&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22673&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22673.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22673/head:pull/22673 PR: https://git.openjdk.org/jdk/pull/22673 From coleenp at openjdk.org Mon Feb 3 22:52:16 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Feb 2025 22:52:16 GMT Subject: RFR: 8334320: Replace vmTestbase/metaspace/share/TriggerUnloadingWithWhiteBox.java with ClassUnloadCommon from testlibrary In-Reply-To: References: Message-ID: On Fri, 17 Jan 2025 13:43:35 GMT, Coleen Phillimore wrote: > Rename and make TriggerUnloadingWithWhiteBox call ClassUnloadCommon.triggerUnloading() instead of WB.fullGC(). Thanks for the reviews Leonid and David. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23174#issuecomment-2632354077 From coleenp at openjdk.org Mon Feb 3 22:52:17 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Feb 2025 22:52:17 GMT Subject: Integrated: 8334320: Replace vmTestbase/metaspace/share/TriggerUnloadingWithWhiteBox.java with ClassUnloadCommon from testlibrary In-Reply-To: References: Message-ID: <73lPbNXqg9tbkN1Ju02t6829vpNJDAUDacDV-Mx6dhQ=.1c47ebf7-31e1-4d90-8794-68cd1c61d4c9@github.com> On Fri, 17 Jan 2025 13:43:35 GMT, Coleen Phillimore wrote: > Rename and make TriggerUnloadingWithWhiteBox call ClassUnloadCommon.triggerUnloading() instead of WB.fullGC(). This pull request has now been integrated. Changeset: 9b495972 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/9b49597244f898400222cfc252f50a2401ca3e2f Stats: 80 lines in 4 files changed: 36 ins; 38 del; 6 mod 8334320: Replace vmTestbase/metaspace/share/TriggerUnloadingWithWhiteBox.java with ClassUnloadCommon from testlibrary Reviewed-by: dholmes, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/23174 From iklam at openjdk.org Tue Feb 4 05:48:29 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 4 Feb 2025 05:48:29 GMT Subject: RFR: 8348349: Refactor CDSConfig::is_dumping_heap() [v3] In-Reply-To: References: Message-ID: > Please review this small clean up: > > `HeapShared::can_write()` and `CDSConfig::is_dumping_heap()` are both for deciding whether CDS should dump heap objects. I removed the former and consolidated all the logic to the latter. > > I also updated the logging message in case heap objects cannot be dumped. > > I also updated VMProps to clarify what `vmCDSCanWriteArchivedJavaHeap()` means. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into 8348349-refactor-heap-shared-can-write - @matias9927 comment - Fixed whitespace - Fixed 32-bit build - 8348349: Refactor HeapShared::can_write() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23249/files - new: https://git.openjdk.org/jdk/pull/23249/files/a6a77ca7..846e99cb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23249&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23249&range=01-02 Stats: 29621 lines in 1223 files changed: 14643 ins; 7811 del; 7167 mod Patch: https://git.openjdk.org/jdk/pull/23249.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23249/head:pull/23249 PR: https://git.openjdk.org/jdk/pull/23249 From stuefe at openjdk.org Tue Feb 4 06:11:20 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 4 Feb 2025 06:11:20 GMT Subject: RFR: 8334320: Replace vmTestbase/metaspace/share/TriggerUnloadingWithWhiteBox.java with ClassUnloadCommon from testlibrary In-Reply-To: References: Message-ID: On Fri, 24 Jan 2025 12:48:37 GMT, Coleen Phillimore wrote: > @tstuefe Can you review and comment since you added these tests? Thank you! Sorry, @coleenp, I saw this too late. Your changes look good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23174#issuecomment-2632964787 From stuefe at openjdk.org Tue Feb 4 07:20:18 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 4 Feb 2025 07:20:18 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree In-Reply-To: References: Message-ID: <2HDwij0XI1F3WeoFTk_I9xub8uLrxMDidW9I5JZZbOM=.d4b416c7-5019-4cfe-8e5d-09a1d6db32b3@github.com> On Mon, 3 Feb 2025 11:20:49 GMT, Casper Norrbin wrote: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Ping @stooke ------------- PR Comment: https://git.openjdk.org/jdk/pull/23416#issuecomment-2633068509 From stuefe at openjdk.org Tue Feb 4 07:27:09 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 4 Feb 2025 07:27:09 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 11:20:49 GMT, Casper Norrbin wrote: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! @caspernorrbin Thank you for this. Will make the RB tree much more useful to me. About the example: Cursor insert_cursor = my_intrusive_tree.cursor_find(0); Node insert_node = Node(0); // Custom allocation here is just malloc MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); new (place) MyIntrusiveStructure(0, insert_node); Why not skip the useless insert_node creation? Let insert_at_cursor just accept uninitialized Node* memory, since it will initialize all Node members anyway? I will do a full review after I get home. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23416#issuecomment-2633077634 From epeter at openjdk.org Tue Feb 4 08:52:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Feb 2025 08:52:15 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v16] In-Reply-To: References: Message-ID: <_5bwBRKG8Zu7iywOJZ6WgUb6N4so1sAO6Ua8S0zQU94=.3200ef74-4e50-424b-a3da-637be63e3f0c@github.com> On Mon, 3 Feb 2025 18:11:11 GMT, Jatin Bhateja wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java >> >> Co-authored-by: Emanuel Peter > > Hi @PaulSandoz , @eme64 , All outstanding comments haven been addressed, please let me know if there are other comments. @jatin-bhateja Testing is all green :green_circle: Doing a last pass over the code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22754#issuecomment-2633248273 From epeter at openjdk.org Tue Feb 4 09:03:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Feb 2025 09:03:17 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v16] In-Reply-To: References: Message-ID: On Thu, 30 Jan 2025 11:03:43 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 9. X86 backend implementation for all supported intrinsics. >> 10. Functional and Performance validation tests. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java > > Co-authored-by: Emanuel Peter src/hotspot/share/opto/convertnode.hpp line 222: > 220: class ReinterpretS2HFNode : public Node { > 221: public: > 222: ReinterpretS2HFNode(Node* in1) : Node(0, in1) {} Suggestion: ReinterpretS2HFNode(Node* in1) : Node(nullptr, in1) {} Oh, just caught this. I think you should not use `0` here any more, check all other uses. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1940762320 From epeter at openjdk.org Tue Feb 4 09:16:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Feb 2025 09:16:17 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v16] In-Reply-To: References: Message-ID: On Thu, 30 Jan 2025 11:03:43 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 9. X86 backend implementation for all supported intrinsics. >> 10. Functional and Performance validation tests. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java > > Co-authored-by: Emanuel Peter Ooops, I found a few more details. But the C++ VM changes look really good now. The Java changes I leave to @PaulSandoz src/hotspot/share/opto/convertnode.cpp line 971: > 969: return true; > 970: default: > 971: return false; Does this cover all cases? What about `FmaHF`? src/hotspot/share/opto/convertnode.hpp line 234: > 232: class ReinterpretHF2SNode : public Node { > 233: public: > 234: ReinterpretHF2SNode(Node* in1) : Node(0, in1) {} Suggestion: ReinterpretHF2SNode(Node* in1) : Node(nullptr, in1) {} src/hotspot/share/opto/divnode.cpp line 866: > 864: // Dividing by self is 1. > 865: // IF the divisor is 1, we are an identity on the dividend. > 866: Node* DivHFNode::Identity(PhaseGVN* phase) { Remove line with `isA_Copy`. src/hotspot/share/opto/type.cpp line 1106: > 1104: if (_base == FloatBot || _base == FloatTop) return FLOAT; > 1105: if (_base == HalfFloatTop || _base == HalfFloatBot) return Type::BOTTOM; > 1106: if (_base == DoubleTop || _base == DoubleBot) return Type::BOTTOM; If you already fixing the style, you should use curly braces as I said above ;) src/hotspot/share/opto/type.cpp line 1472: > 1470: //------------------------------meet------------------------------------------- > 1471: // Compute the MEET of two types. It returns a new Type object. > 1472: const Type* TypeH::xmeet(const Type* t) const { Suggestion: //------------------------------xmeet------------------------------------------- // Compute the MEET of two types. It returns a new Type object. const Type* TypeH::xmeet(const Type* t) const { ------------- PR Review: https://git.openjdk.org/jdk/pull/22754#pullrequestreview-2592155651 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1940766035 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1940763403 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1940766624 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1940771256 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1940771662 From tschatzl at openjdk.org Tue Feb 4 09:50:16 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Feb 2025 09:50:16 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME In-Reply-To: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Thu, 30 Jan 2025 12:12:29 GMT, Albert Mingkun Yang wrote: > Here is an attempt to simplify GCLocker implementation for Serial and Parallel. > > GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. > > The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. > > Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. > > Test: tier1-8 * Idk if GCLocker JFR events need to be available in metadata.xml if the VM does not actually ever send it. I think it does not. Maybe it is used to decode from old recordings, may be worth asking e.g. @egahlin . * the bot shows a failure that this PR's CR number shows up in the problemlist, that line needs to be deleted as well. Further it would be interesting to see how many retries there are in the allocation loop with these jnilock* stress test. * another issue, probably todo is that while Parallel GC has the emergency bailout via GC Overhead limit after excessive retries, Serial does not. Which means that it might retry for a long time, which isn't good (while it did earlier if the number of retries due to gclocker exceed that threshold) src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 323: > 321: } > 322: > 323: if (result == nullptr) { pre-existing: is it actually possible that `result` is not `nullptr` here? The code above always returns with a non-null result. Maybe assert this instead. src/hotspot/share/gc/shared/gcLocker.cpp line 86: > 84: void GCLocker::block() { > 85: assert(_lock->is_locked(), "precondition"); > 86: assert(Atomic::load(&_is_gc_request_pending) == false, "precondition"); Suggestion: assert(!Atomic::load(&_is_gc_request_pending), "precondition"); src/hotspot/share/gc/shared/gcLocker.cpp line 106: > 104: > 105: #ifdef ASSERT > 106: // Matching the storestore in GCLocker::exit Suggestion: // Matching the storestore in GCLocker::exit. src/hotspot/share/gc/shared/gcLocker.cpp line 114: > 112: void GCLocker::unblock() { > 113: assert(_lock->is_locked(), "precondition"); > 114: assert(Atomic::load(&_is_gc_request_pending) == true, "precondition"); Suggestion: assert(Atomic::load(&_is_gc_request_pending), "precondition"); src/hotspot/share/gc/shared/gcLocker.hpp line 31: > 29: #include "memory/allStatic.hpp" > 30: #include "runtime/mutex.hpp" > 31: Documentation how GCLocker works/is supposed to work is missing here. It's not exactly trivial. src/hotspot/share/gc/shared/gcLocker.hpp line 33: > 31: > 32: class GCLocker: public AllStatic { > 33: static Monitor* _lock; Not sure if having this copy/reference to `Heap_lock` makes the code more clear than referencing `Heap_lock` directly. It needs to be `Heap_lock` anyway. src/hotspot/share/gc/shared/gcLocker.hpp line 37: > 35: > 36: #ifdef ASSERT > 37: static uint64_t _debug_count; Maybe the variable could be named something less generic, indicating what it is counting. Or add a comment. src/hotspot/share/gc/shared/gcLocker.inline.hpp line 40: > 38: if (Atomic::load(&_is_gc_request_pending)) { > 39: thread->exit_critical(); > 40: // slow-path Suggestion: Not sure what this `slow-path` comment helps with. Maybe it is describing the next method (but it is named very similarly), or this is an attempt to describe the true-block of the if. In the latter case, it would maybe be better to put this comment at the start of the true-block of the if, and say something more descriptive like `// Another thread is requesting gc, enter slow path.` Not sure, feel free to ignore, it's just that to me the comment should either be removed or put upwards a line. src/hotspot/share/gc/shared/gcLocker.inline.hpp line 56: > 54: if (thread->in_last_critical()) { > 55: Atomic::add(&_debug_count, (uint64_t)-1); > 56: // Matching the loadload in GCLocker::block Suggestion: // Matching the loadload in GCLocker::block. src/hotspot/share/gc/shared/gcTraceSend.cpp line 364: > 362: #if INCLUDE_JFR > 363: > 364: #endif Please remove this empty `#if/#endif` block. src/hotspot/share/gc/shared/gc_globals.hpp line 162: > 160: "blocked by the GC locker") \ > 161: range(0, max_uintx) \ > 162: \ This removal should warrant a release note; while it's a diagnostic option and we can remove at a whim, it is in use to workaround issues. src/hotspot/share/prims/whitebox.cpp line 48: > 46: #include "gc/shared/concurrentGCBreakpoints.hpp" > 47: #include "gc/shared/gcConfig.hpp" > 48: #include "gc/shared/gcLocker.hpp" Suggestion: The file does not seem to use the `GCLocker` class anymore, please remove this line as well. ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2592106484 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940732531 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940775211 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940813063 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940779840 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940770235 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940769765 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940796501 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940793704 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940812598 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940746077 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940748992 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940752118 From jbhateja at openjdk.org Tue Feb 4 10:05:09 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 4 Feb 2025 10:05:09 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v17] In-Reply-To: References: Message-ID: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 9. X86 backend implementation for all supported intrinsics. > 10. Functional and Performance validation tests. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Fixing typos ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22754/files - new: https://git.openjdk.org/jdk/pull/22754/files/8207c9ff..82a42213 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22754&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22754&range=15-16 Stats: 13 lines in 3 files changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/22754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22754/head:pull/22754 PR: https://git.openjdk.org/jdk/pull/22754 From jbhateja at openjdk.org Tue Feb 4 10:05:11 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 4 Feb 2025 10:05:11 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v16] In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 18:11:11 GMT, Jatin Bhateja wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java >> >> Co-authored-by: Emanuel Peter > > Hi @PaulSandoz , @eme64 , All outstanding comments haven been addressed, please let me know if there are other comments. > @jatin-bhateja Testing is all green ? Doing a last pass over the code. Thanks @eme64, looking forward to your approval :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22754#issuecomment-2633414710 From jbhateja at openjdk.org Tue Feb 4 10:05:11 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 4 Feb 2025 10:05:11 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v16] In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 09:03:09 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java >> >> Co-authored-by: Emanuel Peter > > src/hotspot/share/opto/convertnode.cpp line 971: > >> 969: return true; >> 970: default: >> 971: return false; > > Does this cover all cases? What about `FmaHF`? FmaHF is a ternary operation and is intrinsified. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1940855109 From dholmes at openjdk.org Tue Feb 4 10:40:12 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Feb 2025 10:40:12 GMT Subject: RFR: 8349083: Factor out filename handling code from logging In-Reply-To: <0A0RqOvBYkQ-6PV1nKsRSrVfjjdewmy69KWoUlEC29U=.28cd12b0-cb71-48d5-91d8-0c94ce7e3f53@github.com> References: <0A0RqOvBYkQ-6PV1nKsRSrVfjjdewmy69KWoUlEC29U=.28cd12b0-cb71-48d5-91d8-0c94ce7e3f53@github.com> Message-ID: On Mon, 3 Feb 2025 13:59:48 GMT, Zhengyu Gu wrote: >> src/hotspot/share/utilities/filenameUtil.hpp line 41: >> >>> 39: // Expand wildcards in filename: >>> 40: // %p -> PID >>> 41: // %t -> timestamp in YY-MM-DD_HH_MM_SS format >> >> Not just a "timestamp" though it is specifically the VM start time. > > The "timestamp" comes from `os::javaTimeMillis()`, it is not the VM start time. > > Unified logging captures the VM start time and uses it from its filename substitution. > > > output = new LogFileOutput(name, _vm_start_time); Sorry then I have misunderstood the refactoring of this. I was expecting "%t" to always mean the VM start time - otherwise what time is it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23410#discussion_r1940918335 From dholmes at openjdk.org Tue Feb 4 10:40:12 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Feb 2025 10:40:12 GMT Subject: RFR: 8349083: Factor out filename handling code from logging In-Reply-To: References: <0A0RqOvBYkQ-6PV1nKsRSrVfjjdewmy69KWoUlEC29U=.28cd12b0-cb71-48d5-91d8-0c94ce7e3f53@github.com> Message-ID: On Tue, 4 Feb 2025 10:37:06 GMT, David Holmes wrote: >> The "timestamp" comes from `os::javaTimeMillis()`, it is not the VM start time. >> >> Unified logging captures the VM start time and uses it from its filename substitution. >> >> >> output = new LogFileOutput(name, _vm_start_time); > > Sorry then I have misunderstood the refactoring of this. I was expecting "%t" to always mean the VM start time - otherwise what time is it? The time the file was created? The time the file name was "constructed"? Or ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23410#discussion_r1940919876 From stuefe at openjdk.org Tue Feb 4 10:42:09 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 4 Feb 2025 10:42:09 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 11:20:49 GMT, Casper Norrbin wrote: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! @caspernorrbin could you massage this patch a bit to reduce the delta to the last version? That is a good idea in general (I usually do a manual minimize-delta sweep before I undraft a PR for review). A lot seems to be code movement at first glance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23416#issuecomment-2633516633 From adinn at openjdk.org Tue Feb 4 11:48:09 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 4 Feb 2025 11:48:09 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v2] In-Reply-To: References: <7UgNYEuTu6rj7queOgM9xIy-6kQMdACrZiDLtlniMYw=.dff6f18b-1236-43b1-8280-2bce9160f32a@github.com> Message-ID: On Mon, 3 Feb 2025 18:11:51 GMT, Ferenc Rakoczi wrote: >> @ferakocz I'm afraid you lucked out on getting your change committed before my reorganization of the stub generation code. If you are unsure of how to do the merge so your new stub is declared and generated following the new model (see the doc comments in stubDeclarations.hpp for details) let me know and I'll be happy to help you sort it out. > >> @ferakocz I'm afraid you lucked out on getting your change committed before my reorganization of the stub generation code. If you are unsure of how to do the merge so your new stub is declared and generated following the new model (see the doc comments in stubDeclarations.hpp for details) let me know and I'll be happy to help you sort it out. > > @adinn I think I managed to figure it out. Please take a look at the PR and let me know if I should have done anything differently. @ferakocz Yes, the stub declaration part of it looks to be correct. The rest of the patch will need at least two reviewers (@theRealAph? @martinuy? @franferrax) and may take some time to review, given that they will probably need to read up on the maths and algorithms. As an aid for reviewers and maintainers it would be good to insert a comment into the generator file linking the implementations to the relevant maths and algorithm. I found the FIPS-204 spec and the CRYSTALS-Dilithium Algorithm Speci?cations and Supporting Documentation paper, Shi Bai, L?o Ducas et al, 2021 - are they the best ones to look at? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2633666753 From thomas.stuefe at gmail.com Tue Feb 4 11:51:11 2025 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 4 Feb 2025 12:51:11 +0100 Subject: Deprecate -UseCompressedClassPointers? Message-ID: Hi all, I would like to get rid of the `-UseCompressedClassPointers` case since it would cut down the number of configurations we need to support and test from three to two (`-UseCompressedClassPointers`, `+UseCompressedClassPointers`, `+UseCompactObjectHeaders`). This would leave us with the now default case, `+UseCompressedClassPointers`, as the sole supported CCP case, thereby removing the need for the switch, which we therefore should deprecate and eventually remove. Apart from significantly reducing code complexity and testing effort, `-UseCompressedClassPointers` does not seem to be tested that well, especially on 64-bit platforms. See e.g. https://github.com/openjdk/jdk/pull/23053, and Roman's suspicion is that there are many more. It increases memory usage by quite a bit ("Alias for -XX:WasteMemory" - Erik ?sterlund), and any historical connection to UseCompresseedOops have long been removed. Why would we still need `-UseCompressedClassPointers`? Two reasons: 1) To support 32-bit, where, atm, it is the only implemented mode. But I am confident that I can find some low-effort low-code way to "fake" compressed Klass* pointers, since after all the 32-bit address space could be seen as a 4GB class space. There is also the bigger question of the future of 32-bit - we discussed this at the FOSDEM OpenJDK workshop, with mixed results, but it seems likely that 32-bit will go away at some point, the only question is when. 2) To load more than ~5-6 million classes. Class space, when maxed out, allows for about 5-6 million classes, given a typical Klass size distribution. I think that number is ridiculous, though. If you load or generate that many classes, you are a likely very patient programmer with a leaky or misdesigned application (just consider for a moment that to fill 4GB class space to the brim with Klass instances, would would typically use up about 5-10 times as much in non-class metaspace. That is for metadata alone. I cannot see a sane application doing that. Is anyone using -UseCompressedClassPointers for any valid reason I am not aware of? If not, barring any objections, my plan is to deprecate UseCompressedClassPointers for JDK25, find an alternative for 32-bit platforms in JDK26, and remove the uncompressed case in JDK 26 or later. What do people think? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefank at openjdk.org Tue Feb 4 12:34:11 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 4 Feb 2025 12:34:11 GMT Subject: RFR: 8323158: HotSpot Style Guide should specify more include ordering In-Reply-To: <1gOEDAkU1eGcUnYqPawR3g5OqAseBiVIVPUIi4O9GYc=.321e1b86-3c6f-4c92-8036-27240a778659@github.com> References: <9RwWVrZpTqsRO5srrrT0jOt4CGc7oF5FEm06Pzjf2yI=.a5fc3070-226c-4292-9802-426c9cab1672@github.com> <1gOEDAkU1eGcUnYqPawR3g5OqAseBiVIVPUIi4O9GYc=.321e1b86-3c6f-4c92-8036-27240a778659@github.com> Message-ID: On Mon, 3 Feb 2025 12:14:53 GMT, Doug Simon wrote: > > I haven't felt the urge to write such a script, but I know that others have scripts to sort the includes > > Ok, it was just a suggestion. My experience is that while clearly written conventions/rules are important, the more they can be automated, the less hassle for everyone. I agree, that a script is beneficial. Do you know of anyone that would be willing to help out and write such script? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23388#issuecomment-2633768991 From alanb at openjdk.org Tue Feb 4 13:39:14 2025 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 4 Feb 2025 13:39:14 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Mon, 9 Dec 2024 19:26:53 GMT, Coleen Phillimore wrote: > The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. > > There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. > > Tested with tier1-8. src/hotspot/share/oops/arrayKlass.hpp line 2: > 1: /* > 2: * Copyright (c) 1997, 2025, Oracle and/or its affiliates. All rights reserved. arrayKlass.hpp isn't changed, is this left over from a previous iteration? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1941185550 From zgu at openjdk.org Tue Feb 4 13:53:12 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Tue, 4 Feb 2025 13:53:12 GMT Subject: RFR: 8349083: Factor out filename handling code from logging In-Reply-To: References: <0A0RqOvBYkQ-6PV1nKsRSrVfjjdewmy69KWoUlEC29U=.28cd12b0-cb71-48d5-91d8-0c94ce7e3f53@github.com> Message-ID: On Tue, 4 Feb 2025 10:38:10 GMT, David Holmes wrote: >> Sorry then I have misunderstood the refactoring of this. I was expecting "%t" to always mean the VM start time - otherwise what time is it? > > The time the file was created? The time the file name was "constructed"? Or ? The API does not dictate what timestamp value to use, caller can pass in the timestamp it wants to use. When no value is provided, it uses "current timestamp". In followup CRs, I don't intend to change the current timestamp sources if there are any. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23410#discussion_r1941210801 From jsjolen at openjdk.org Tue Feb 4 13:53:26 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Feb 2025 13:53:26 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v20] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: <0I2mUN4F0QBOYVFxIX7UE1aYwJIEUNAOucOjCuxmS58=.091c86ae-399c-4786-b2e9-20593a4e4425@github.com> On Thu, 30 Jan 2025 12:55:35 GMT, Afshin Zafari wrote: >> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. >> - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. >> - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` >> - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. >> - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. >> - Adding a runtime flag for selecting the old or new version can be added later. >> - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > fix in shendoahCardTable Some issues found from misses made during merging. test/hotspot/gtest/nmt/test_vmatree.cpp line 831: > 829: tty->cr(); > 830: } > 831: } This looks like a test which was added when you merged in an earlier version of https://github.com/openjdk/jdk/pull/20994 This should be removed. test/hotspot/jtreg/compiler/stringopts/TestFluidAndNonFluid.java line 1: > 1: /* Incorrect merge resolution. test/hotspot/jtreg/runtime/NMT/VirtualAllocCommitMerge.java line 325: > 323: output.shouldMatch("\\[0x[0]*" + Long.toHexString(addr) + " - 0x[0]*" > 324: + Long.toHexString(addr + size) > 325: + "\\] committed " + sizeString); Not sure why this is changed. test/hotspot/jtreg/runtime/Thread/TestAlwaysPreTouchStacks.java line 182: > 180: throw new RuntimeException("Expected a higher delta between stack committed of with and without pretouch." + > 181: "Expected: " + expected_delta + " Actual: " + actual_delta); > 182: } Is this meant to be part of this PR? test/micro/org/openjdk/bench/vm/compiler/FluidSBBench.java line 1: > 1: /* This must be the result of an incorrect merge conflict resolution. It should be deleted. ------------- PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2550042933 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1941207799 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1941201897 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1941199974 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1941201150 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1941198899 From jsjolen at openjdk.org Tue Feb 4 14:01:31 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Feb 2025 14:01:31 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v20] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 30 Jan 2025 12:55:35 GMT, Afshin Zafari wrote: >> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. >> - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. >> - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` >> - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. >> - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. >> - Adding a runtime flag for selecting the old or new version can be added later. >> - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > fix in shendoahCardTable src/hotspot/share/nmt/virtualMemoryTracker.cpp line 2: > 1: /* > 2: * Copyright (c) 2013, 2025, Oracle and/or its affiliates. All rights reserved. Weird, another copyright issue here src/hotspot/share/nmt/virtualMemoryTracker.hpp line 3: > 1: /* > 2: * Copyright (c) 2013, 2024, Oracle and/or its affiliates. All rights reserved. > 3: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. Incorrect copyright addition. src/hotspot/share/nmt/virtualMemoryTracker.hpp line 27: > 25: > 26: #ifndef NMT_VIRTUALMEMORYTRACKER_HPP > 27: #define NMT_VIRTUALMEMORYTRACKER_HPP This shouldn't be changed??? src/hotspot/share/nmt/vmatree.cpp line 81: > 79: stA.out.set_tag(tag); > 80: LEQ_A.state.out.set_tag(tag); > 81: stB.in.set_tag(tag); Commented out assert and an addition I'm trying to wrap my head around. Does this fix a pre-existing bug? If so, this should be a separate PR for mainline before this PR is integrated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1941220423 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1941219903 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1941219525 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1941214505 From alanb at openjdk.org Tue Feb 4 14:00:13 2025 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 4 Feb 2025 14:00:13 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native In-Reply-To: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Mon, 9 Dec 2024 19:26:53 GMT, Coleen Phillimore wrote: > The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. > > There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. > > Tested with tier1-8. Good cleanup. src/java.base/share/classes/java/lang/Class.java line 244: > 242: classLoader = loader; > 243: componentType = arrayComponentType; > 244: modifiers = dummyModifiers; I realize this ctor isn't used but "dummyModifiers" looks very strange as parameter name when compared to the others, renaming it to something like "mods" would make it less confusing for anyone reading through this code. ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22652#pullrequestreview-2592938860 PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1941220263 From stuefe at openjdk.org Tue Feb 4 14:08:10 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 4 Feb 2025 14:08:10 GMT Subject: RFR: 8323158: HotSpot Style Guide should specify more include ordering [v2] In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 12:14:35 GMT, Stefan Karlsson wrote: >> The HotSpot Style Guide has a section about source files and includes. The style used for includes have mostly been introduced by scripts when includeDB was replaced, but also when various other enhancements to our includes were made. Some of the introduced styles were never written down in the style guide. >> >> I propose a couple of changes to the HotSpot Style Guide to reflect some of these implicit styles that we have. While updating the text I also took the liberty to order the items in an order that I felt was good. >> >> Note that JDK-8323158 contains a few more suggestions, but I've only addressed the items that I think can be accepted without much contention. Either I extract the items that have not been address into a new RFE, or I create a new RFE for this PR. >> >> There a some small whitespace tweaks that I made so that the .md and .html had a similar layout. > > Stefan Karlsson has updated the pull request incrementally with two additional commits since the last revision: > > - Update hotspot-style.md > - Update hotspot-style.html This looks good to me. Thank you. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23388#pullrequestreview-2592970726 From alanb at openjdk.org Tue Feb 4 14:09:16 2025 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 4 Feb 2025 14:09:16 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v4] In-Reply-To: References: Message-ID: On Fri, 31 Jan 2025 17:16:11 GMT, Coleen Phillimore wrote: >> src/java.base/share/classes/java/lang/System.java line 2150: >> >>> 2148: } >>> 2149: >>> 2150: public ProtectionDomain protectionDomain(Class c) { >> >> This accessor has become pointless since the functinoal removal of SM, but it would be out of scope for this PR. > > Thanks for looking at this. I didn't want to change this with this change. There may need to be some follow-on cleanup, e.g. I'm wondering if Lookup.cachedProtectionDomain is needed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1941241449 From coleenp at openjdk.org Tue Feb 4 14:43:51 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 4 Feb 2025 14:43:51 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: > The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. > > There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. > > Tested with tier1-8. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix copyright and param name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22652/files - new: https://git.openjdk.org/jdk/pull/22652/files/8854fcc6..ff693418 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22652&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22652&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22652.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22652/head:pull/22652 PR: https://git.openjdk.org/jdk/pull/22652 From coleenp at openjdk.org Tue Feb 4 14:43:51 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 4 Feb 2025 14:43:51 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Tue, 4 Feb 2025 14:40:47 GMT, Coleen Phillimore wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright and param name Thank you for your comments, Alan. ------------- PR Review: https://git.openjdk.org/jdk/pull/22652#pullrequestreview-2593075666 From coleenp at openjdk.org Tue Feb 4 14:43:51 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 4 Feb 2025 14:43:51 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Tue, 4 Feb 2025 13:36:44 GMT, Alan Bateman wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix copyright and param name > > src/hotspot/share/oops/arrayKlass.hpp line 2: > >> 1: /* >> 2: * Copyright (c) 1997, 2025, Oracle and/or its affiliates. All rights reserved. > > arrayKlass.hpp isn't changed, is this left over from a previous iteration? yes, it was something that my copyright script thought I changed from merging some previous changes. > src/java.base/share/classes/java/lang/Class.java line 244: > >> 242: classLoader = loader; >> 243: componentType = arrayComponentType; >> 244: modifiers = dummyModifiers; > > I realize this ctor isn't used but "dummyModifiers" looks very strange as parameter name when compared to the others, renaming it to something like "mods" would make it less confusing for anyone reading through this code. I changed it to mods. Thanks for the suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1941301152 PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1941302820 From coleenp at openjdk.org Tue Feb 4 15:02:11 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 4 Feb 2025 15:02:11 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v4] In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 14:06:07 GMT, Alan Bateman wrote: >> Thanks for looking at this. I didn't want to change this with this change. > > There may need to be some follow-on cleanup, e.g. I'm wondering if Lookup.cachedProtectionDomain is needed now. One of the reasons I wanted to move this out of Hotspot as a native call is that it might make further work with ProtectionDomain easier to do all in Java, except there's still a bit of coupling in the JVM with the name of the class and that it's passed through defineClass (resolve_from_stream) and initialized in the mirror. So I guess that's still a lot. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1941345990 From phh at openjdk.org Tue Feb 4 15:23:17 2025 From: phh at openjdk.org (Paul Hohensee) Date: Tue, 4 Feb 2025 15:23:17 GMT Subject: RFR: 8348610: GenShen: TestShenandoahEvacuationInformationEvent failed with setRegions >= regionsFreed: expected 1 >= 57 In-Reply-To: References: Message-ID: On Thu, 30 Jan 2025 04:43:59 GMT, Satyen Subramaniam wrote: > Renaming `ShenandoahEvacuationInformation.freedRegions` to `ShenandoahEvacuationInformation.freeRegions` for clarity, and fixing incorrect assertion in TestShenandoahEvacuationInformationEvent.cpp > > Tested with tier 1, tier 2, and tier 3 tests. Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23362#pullrequestreview-2593210592 From iklam at openjdk.org Tue Feb 4 16:11:19 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 4 Feb 2025 16:11:19 GMT Subject: RFR: 8348349: Refactor CDSConfig::is_dumping_heap() [v3] In-Reply-To: References: Message-ID: On Thu, 23 Jan 2025 22:23:45 GMT, Matias Saavedra Silva wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into 8348349-refactor-heap-shared-can-write >> - @matias9927 comment >> - Fixed whitespace >> - Fixed 32-bit build >> - 8348349: Refactor HeapShared::can_write() > > Change looks good, thanks for the cleanup! I have some small comments that you can address if you think it's valuable: Thanks @matias9927 and @calvinccheung for the review ------------- PR Comment: https://git.openjdk.org/jdk/pull/23249#issuecomment-2634421807 From iklam at openjdk.org Tue Feb 4 16:11:20 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 4 Feb 2025 16:11:20 GMT Subject: Integrated: 8348349: Refactor CDSConfig::is_dumping_heap() In-Reply-To: References: Message-ID: <9Yej4TVhAwFpeV9H5YclHN56xkJEzY-myw2ELQw-uQY=.2086dfcf-a5ea-48bc-9f57-350bd666710c@github.com> On Thu, 23 Jan 2025 02:35:06 GMT, Ioi Lam wrote: > Please review this small clean up: > > `HeapShared::can_write()` and `CDSConfig::is_dumping_heap()` are both for deciding whether CDS should dump heap objects. I removed the former and consolidated all the logic to the latter. > > I also updated the logging message in case heap objects cannot be dumped. > > I also updated VMProps to clarify what `vmCDSCanWriteArchivedJavaHeap()` means. This pull request has now been integrated. Changeset: b985347c Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/b985347c2383a7a637ffa9a4a8687f7f7cde1369 Stats: 111 lines in 12 files changed: 52 ins; 36 del; 23 mod 8348349: Refactor CDSConfig::is_dumping_heap() Reviewed-by: ccheung, matsaave ------------- PR: https://git.openjdk.org/jdk/pull/23249 From alanb at openjdk.org Tue Feb 4 16:40:10 2025 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 4 Feb 2025 16:40:10 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v4] In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 14:59:46 GMT, Coleen Phillimore wrote: >> There may need to be some follow-on cleanup, e.g. I'm wondering if Lookup.cachedProtectionDomain is needed now. > > One of the reasons I wanted to move this out of Hotspot as a native call is that it might make further work with ProtectionDomain easier to do all in Java, except there's still a bit of coupling in the JVM with the name of the class and that it's passed through defineClass (resolve_from_stream) and initialized in the mirror. So I guess that's still a lot. Aside from JVMTI (CFLH for example), is there anything left in the VM that needs this? The last param to JVM_DefineClassWithSource has the location from the code source if available. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1941523958 From duke at openjdk.org Tue Feb 4 16:52:10 2025 From: duke at openjdk.org (duke) Date: Tue, 4 Feb 2025 16:52:10 GMT Subject: RFR: 8348610: GenShen: TestShenandoahEvacuationInformationEvent failed with setRegions >= regionsFreed: expected 1 >= 57 In-Reply-To: References: Message-ID: On Thu, 30 Jan 2025 04:43:59 GMT, Satyen Subramaniam wrote: > Renaming `ShenandoahEvacuationInformation.freedRegions` to `ShenandoahEvacuationInformation.freeRegions` for clarity, and fixing incorrect assertion in TestShenandoahEvacuationInformationEvent.cpp > > Tested with tier 1, tier 2, and tier 3 tests. @satyenme Your change (at version 923a29d84a06315bfde7d3d1d8b48ff27fef8f9e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23362#issuecomment-2634531314 From egahlin at openjdk.org Tue Feb 4 16:56:17 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 4 Feb 2025 16:56:17 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 4 Feb 2025 09:47:20 GMT, Thomas Schatzl wrote: > * Idk if GCLocker JFR events need to be available in metadata.xml if the VM does not actually ever send it. I think it does not. > Maybe it is used to decode from old recordings, may be worth asking e.g. @egahlin . If the event is not used and the metric is not interesting to have anymore, remove it from metadata.xml, default.jfc, profile.jfc, EventNames.java and delete the TestGCLockerEvent.java file. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2634538626 From coleenp at openjdk.org Tue Feb 4 16:59:10 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 4 Feb 2025 16:59:10 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v4] In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 16:37:10 GMT, Alan Bateman wrote: >> One of the reasons I wanted to move this out of Hotspot as a native call is that it might make further work with ProtectionDomain easier to do all in Java, except there's still a bit of coupling in the JVM with the name of the class and that it's passed through defineClass (resolve_from_stream) and initialized in the mirror. So I guess that's still a lot. > > Aside from JVMTI (CFLH for example), is there anything left in the VM that needs this? The last param to JVM_DefineClassWithSource has the location from the code source if available. The VM doesn't need this but it carries it around because it's a parameter to JVM_DefineClass and DefineClassWithSource (second to last parameter). CFLH and CDS from what I can tell have it for the same purpose - ultimately to assign it into the mirror. There are some remaining code in the compilers (ci). Not sure if they are needed without studying it more. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1941554835 From ssubramaniam at openjdk.org Tue Feb 4 17:22:25 2025 From: ssubramaniam at openjdk.org (Satyen Subramaniam) Date: Tue, 4 Feb 2025 17:22:25 GMT Subject: Integrated: 8348610: GenShen: TestShenandoahEvacuationInformationEvent failed with setRegions >= regionsFreed: expected 1 >= 57 In-Reply-To: References: Message-ID: On Thu, 30 Jan 2025 04:43:59 GMT, Satyen Subramaniam wrote: > Renaming `ShenandoahEvacuationInformation.freedRegions` to `ShenandoahEvacuationInformation.freeRegions` for clarity, and fixing incorrect assertion in TestShenandoahEvacuationInformationEvent.cpp > > Tested with tier 1, tier 2, and tier 3 tests. This pull request has now been integrated. Changeset: bad39b6d Author: Satyen Subramaniam Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/bad39b6d8892ba9b86bc81bf01108a1df617defb Stats: 15 lines in 5 files changed: 2 ins; 1 del; 12 mod 8348610: GenShen: TestShenandoahEvacuationInformationEvent failed with setRegions >= regionsFreed: expected 1 >= 57 Reviewed-by: wkemper, phh ------------- PR: https://git.openjdk.org/jdk/pull/23362 From aph-open at littlepinkcloud.com Tue Feb 4 17:49:28 2025 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Tue, 4 Feb 2025 17:49:28 +0000 Subject: RFD: The Cost of Profiling in the HotSpot Virtual Machine In-Reply-To: <837ed36c-463b-4599-9d8a-f40f5fb0d011@oracle.com> References: <06c95e20-4d11-4598-910d-ef75b4d06d22@littlepinkcloud.com> <837ed36c-463b-4599-9d8a-f40f5fb0d011@oracle.com> Message-ID: <7f9fe53d-d1b9-41d2-b888-37d3a9fd5f09@littlepinkcloud.com> On 2/3/25 17:26, Vladimir Kozlov wrote: > It is known issue [1]. We could use randomized profiling counters update as suggested in RFE. > I think we had such implementation in our JAOTC work back in days. But it may not such simple on not x86 platforms. Hmm, OK. > But I think we should go this way > since we are already working on it. Yes, I see. OK, I'll mark my bug report as a dupe of 8134940. Maybe I can assist Igor Veresov with some ideas for AArch64 if it helps, but I won't butt in unless I'm asked when he's working on the problem. I'm glad someone is. Thanks, -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From duke at openjdk.org Tue Feb 4 18:26:03 2025 From: duke at openjdk.org (Robert Toyonaga) Date: Tue, 4 Feb 2025 18:26:03 GMT Subject: RFR: 8276995: Bug in jdk.jfr.event.gc.collection.TestSystemGC Message-ID: <-29UVi0_-B2NGs3H9qXBFJg80IbwC2saJB9jU-cSRQc=.27eb2890-6938-41d5-9511-bb995739ee2a@github.com> Just a tiny test case fix. The test was previously checking the wrong events. ------------- Commit messages: - trivial testcase fix Changes: https://git.openjdk.org/jdk/pull/23445/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23445&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8276995 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23445.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23445/head:pull/23445 PR: https://git.openjdk.org/jdk/pull/23445 From duke at openjdk.org Tue Feb 4 19:00:33 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Tue, 4 Feb 2025 19:00:33 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v2] In-Reply-To: References: <7UgNYEuTu6rj7queOgM9xIy-6kQMdACrZiDLtlniMYw=.dff6f18b-1236-43b1-8280-2bce9160f32a@github.com> Message-ID: On Mon, 3 Feb 2025 18:11:51 GMT, Ferenc Rakoczi wrote: >> @ferakocz I'm afraid you lucked out on getting your change committed before my reorganization of the stub generation code. If you are unsure of how to do the merge so your new stub is declared and generated following the new model (see the doc comments in stubDeclarations.hpp for details) let me know and I'll be happy to help you sort it out. > >> @ferakocz I'm afraid you lucked out on getting your change committed before my reorganization of the stub generation code. If you are unsure of how to do the merge so your new stub is declared and generated following the new model (see the doc comments in stubDeclarations.hpp for details) let me know and I'll be happy to help you sort it out. > > @adinn I think I managed to figure it out. Please take a look at the PR and let me know if I should have done anything differently. > @ferakocz Yes, the stub declaration part of it looks to be correct. > > The rest of the patch will need at least two reviewers (@theRealAph? @martinuy? @franferrax) and may take some time to review, given that they will probably need to read up on the maths and algorithms. As an aid for reviewers and maintainers it would be good to insert a comment into the generator file linking the implementations to the relevant maths and algorithm. I found the FIPS-204 spec and the CRYSTALS-Dilithium Algorithm Speci?cations and Supporting Documentation paper, Shi Bai, L?o Ducas et al, 2021 - are they the best ones to look at? The Java implementation of ML-DSA is based on the FIPS-204 standard and the intrinsicss' implementations are based on the corresponding Java methods, except that the montMul() calls in them are inlined. The rest of the transformation from Java code to intrinsic code is pretty straightforward, so a reviewer need not necessarily understand the whole mathematics of the ML-DSA algorithms, just that the Java and the corresponding intrinsic code do the same thing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2634810518 From epeter at openjdk.org Tue Feb 4 19:09:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Feb 2025 19:09:35 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v17] In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 10:05:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 9. X86 backend implementation for all supported intrinsics. >> 10. Functional and Performance validation tests. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Fixing typos Thanks @jatin-bhateja for all your patience, this really took a while ? It looks good to me - again I'm only reviewing the C++ VM changes, so someone else has to review the Java changes. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22754#pullrequestreview-2593800414 From epeter at openjdk.org Tue Feb 4 19:09:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Feb 2025 19:09:36 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v16] In-Reply-To: References: Message-ID: <7WobCDj_e4Sw1CEYr3EVfgHTxJoxBfiFR63WwrzDDzs=.27e926d0-23e6-4231-a677-fdfd683083be@github.com> On Tue, 4 Feb 2025 09:56:15 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/convertnode.cpp line 971: >> >>> 969: return true; >>> 970: default: >>> 971: return false; >> >> Does this cover all cases? What about `FmaHF`? > > FmaHF is a ternary operation and is intrinsified. Ah, right. My bad ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1941748224 From liach at openjdk.org Tue Feb 4 19:21:44 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 4 Feb 2025 19:21:44 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v17] In-Reply-To: References: Message-ID: <7oq7j2pYG9ToDNcGyVWrphH_wFyvPRX2kl3qxgQYBss=.449139d7-e3a8-4587-b5ce-a5f7f9f5b613@github.com> On Tue, 4 Feb 2025 10:05:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 9. X86 backend implementation for all supported intrinsics. >> 10. Functional and Performance validation tests. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Fixing typos src/java.base/share/classes/jdk/internal/vm/vector/Float16Math.java line 42: > 40: } > 41: > 42: public interface Float16TernaryMathOp { Is there a reason we don't write the default impl explicitly in this class, but ask for a lambda for an implementation? Each intrinsified method only has one default impl, so I think we can just inline that into the method body here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1941764924 From gziemski at openjdk.org Tue Feb 4 21:05:44 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 4 Feb 2025 21:05:44 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v12] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: disable debug prints ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/3d299c31..2f591c51 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=10-11 Stats: 81 lines in 2 files changed: 8 ins; 16 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From dean.long at oracle.com Tue Feb 4 22:29:39 2025 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 4 Feb 2025 14:29:39 -0800 Subject: RFD: The Cost of Profiling in the HotSpot Virtual Machine In-Reply-To: <7f9fe53d-d1b9-41d2-b888-37d3a9fd5f09@littlepinkcloud.com> References: <06c95e20-4d11-4598-910d-ef75b4d06d22@littlepinkcloud.com> <837ed36c-463b-4599-9d8a-f40f5fb0d011@oracle.com> <7f9fe53d-d1b9-41d2-b888-37d3a9fd5f09@littlepinkcloud.com> Message-ID: On 2/4/25 9:49 AM, Andrew Haley wrote: > On 2/3/25 17:26, Vladimir Kozlov wrote: >> It is known issue [1].? We could use randomized profiling counters >> update as suggested in RFE. >> I think we had such implementation in our JAOTC work back in days. >> But it may not such simple on not x86 platforms. > > Hmm, OK. > >> But I think we should go this way >> since we are already working on it. > > Yes, I see. OK, I'll mark my bug report as a dupe of 8134940. > > Maybe I can assist Igor Veresov with some ideas for AArch64 if it helps, > but I won't butt in unless I'm asked when he's working on the problem. > I'm glad someone is. > > Thanks, > When we talked about this today, it sounded like Igor was busy with Leyden and not actively working on 8134940, so he may be happy to give it to you.? Please check with him. dl From dlong at openjdk.org Wed Feb 5 01:13:20 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 5 Feb 2025 01:13:20 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Tue, 4 Feb 2025 14:43:51 GMT, Coleen Phillimore wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright and param name test/micro/org/openjdk/bench/java/lang/reflect/Clazz.java line 73: > 71: public int getAppArrayModifiers() { > 72: return clazzArray.getClass().getModifiers(); > 73: } I'm guessing this is the benchmark that shows an extra load. How about adding a benchmark that makes the Clazz[] final or @Stable, and see if that makes the extra load go away? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1942114565 From dholmes at openjdk.org Wed Feb 5 01:48:23 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 5 Feb 2025 01:48:23 GMT Subject: RFR: 8323158: HotSpot Style Guide should specify more include ordering [v2] In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 12:14:35 GMT, Stefan Karlsson wrote: >> The HotSpot Style Guide has a section about source files and includes. The style used for includes have mostly been introduced by scripts when includeDB was replaced, but also when various other enhancements to our includes were made. Some of the introduced styles were never written down in the style guide. >> >> I propose a couple of changes to the HotSpot Style Guide to reflect some of these implicit styles that we have. While updating the text I also took the liberty to order the items in an order that I felt was good. >> >> Note that JDK-8323158 contains a few more suggestions, but I've only addressed the items that I think can be accepted without much contention. Either I extract the items that have not been address into a new RFE, or I create a new RFE for this PR. >> >> There a some small whitespace tweaks that I made so that the .md and .html had a similar layout. > > Stefan Karlsson has updated the pull request incrementally with two additional commits since the last revision: > > - Update hotspot-style.md > - Update hotspot-style.html Generally seems fine. Thanks. doc/hotspot-style.html line 213: > 211:
  • Put conditional inclusions (`#if ...`) at the end of the section of HotSpot > 212: include lines. This also applies to macro-expanded includes of platform > 213: dependent files.

  • What is the order for the conditional sections? Alphabetic on the include guard? ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23388#pullrequestreview-2594441092 PR Review Comment: https://git.openjdk.org/jdk/pull/23388#discussion_r1942135021 From dholmes at openjdk.org Wed Feb 5 05:39:14 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 5 Feb 2025 05:39:14 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v4] In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 16:11:06 GMT, Coleen Phillimore wrote: >> This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix test that knows which fields are hidden from reflection in jvmci. src/java.base/share/classes/java/lang/Class.java line 239: > 237: * generated. > 238: */ > 239: private Class(ClassLoader loader, Class arrayComponentType, ProtectionDomain pd) { If this constructor is not used then why do we need to add the PD argument, rather than just set it to null? For that matter why do we even need the field if nothing is ever setting it? I'm missing something here. src/java.base/share/classes/java/lang/Class.java line 2701: > 2699: > 2700: @Stable > 2701: private transient final ProtectionDomain protectionDomain; Isn't `@Stable` superfluous with a final field? src/java.base/share/classes/java/lang/Class.java line 2722: > 2720: */ > 2721: public ProtectionDomain getProtectionDomain() { > 2722: if (protectionDomain == null) { Does this imply the class is a primitive class? test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java line 205: > 203: > 204: /** > 205: * Test that some Class fields cannot be make accessible. Suggestion: * Test that some Class fields cannot be made accessible. test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java line 216: > 214: > 215: assertTrue(pd == null); > 216: } Suggestion: public void testJavaLangClassFields() throws Exception { try { // This field is explicitly hidden from reflection. Class.class.getDeclaredField("protectionDomain"); assertTrue(false); } catch (NoSuchFieldException expected) { } } The above is more in-style with the other test cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1942271834 PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1942261857 PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1942270930 PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1942265829 PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1942268434 From jbhateja at openjdk.org Wed Feb 5 07:09:15 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 5 Feb 2025 07:09:15 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v17] In-Reply-To: <7oq7j2pYG9ToDNcGyVWrphH_wFyvPRX2kl3qxgQYBss=.449139d7-e3a8-4587-b5ce-a5f7f9f5b613@github.com> References: <7oq7j2pYG9ToDNcGyVWrphH_wFyvPRX2kl3qxgQYBss=.449139d7-e3a8-4587-b5ce-a5f7f9f5b613@github.com> Message-ID: On Tue, 4 Feb 2025 19:18:39 GMT, Chen Liang wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixing typos > > src/java.base/share/classes/jdk/internal/vm/vector/Float16Math.java line 42: > >> 40: } >> 41: >> 42: public interface Float16TernaryMathOp { > > Is there a reason we don't write the default impl explicitly in this class, but ask for a lambda for an implementation? Each intrinsified method only has one default impl, so I think we can just inline that into the method body here. This wrapper class is part of java.base module and only contains intrinsic entry points for APIs defined in Float16 class which is part of an incubation module. Thus, exposing intrinsic fallback code through lambda keeps the interface clean while actual API logic and comments around it remains intact in Float16 class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1942344948 From sroy at openjdk.org Wed Feb 5 08:27:52 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Wed, 5 Feb 2025 08:27:52 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v18] In-Reply-To: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: > JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) > > Currently acceleration code for GHASH is missing for PPC64. > > The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. Suchismith Roy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 40 commits: - Merge branch 'openjdk:master' into ghash_processblocks - restore chnges - restore chnges - permute vHigh,vLow - indentation - comments - vsx logic change - spaces - spaces - update references - ... and 30 more: https://git.openjdk.org/jdk/compare/40603a5b...6388d4e1 ------------- Changes: https://git.openjdk.org/jdk/pull/20235/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=17 Stats: 167 lines in 2 files changed: 163 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20235/head:pull/20235 PR: https://git.openjdk.org/jdk/pull/20235 From sroy at openjdk.org Wed Feb 5 08:38:58 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Wed, 5 Feb 2025 08:38:58 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v19] In-Reply-To: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: > JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) > > Currently acceleration code for GHASH is missing for PPC64. > > The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: adapt Condition registers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20235/files - new: https://git.openjdk.org/jdk/pull/20235/files/6388d4e1..068a248c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=17-18 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20235/head:pull/20235 PR: https://git.openjdk.org/jdk/pull/20235 From stefank at openjdk.org Wed Feb 5 08:53:20 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 5 Feb 2025 08:53:20 GMT Subject: RFR: 8323158: HotSpot Style Guide should specify more include ordering [v2] In-Reply-To: References: Message-ID: <2rPnjlXTdPQIsaQNDq7sA69p8YySeLVN896QgU9ABWY=.e2ac8773-7e8d-45bc-85ef-c8bc7d878ae7@github.com> On Wed, 5 Feb 2025 01:42:36 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update hotspot-style.md >> - Update hotspot-style.html > > doc/hotspot-style.html line 213: > >> 211:
  • Put conditional inclusions (`#if ...`) at the end of the section of HotSpot >> 212: include lines. This also applies to macro-expanded includes of platform >> 213: dependent files.

  • > > What is the order for the conditional sections? Alphabetic on the include guard? I don't think there's a set order. I wouldn't mind making it alphabetic with platforms includes coming before the other conditional includes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23388#discussion_r1942464083 From stefank at openjdk.org Wed Feb 5 08:53:20 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 5 Feb 2025 08:53:20 GMT Subject: RFR: 8323158: HotSpot Style Guide should specify more include ordering [v2] In-Reply-To: <2rPnjlXTdPQIsaQNDq7sA69p8YySeLVN896QgU9ABWY=.e2ac8773-7e8d-45bc-85ef-c8bc7d878ae7@github.com> References: <2rPnjlXTdPQIsaQNDq7sA69p8YySeLVN896QgU9ABWY=.e2ac8773-7e8d-45bc-85ef-c8bc7d878ae7@github.com> Message-ID: On Wed, 5 Feb 2025 08:49:59 GMT, Stefan Karlsson wrote: >> doc/hotspot-style.html line 213: >> >>> 211:
  • Put conditional inclusions (`#if ...`) at the end of the section of HotSpot >>> 212: include lines. This also applies to macro-expanded includes of platform >>> 213: dependent files.

  • >> >> What is the order for the conditional sections? Alphabetic on the include guard? > > I don't think there's a set order. I wouldn't mind making it alphabetic with platforms includes coming before the other conditional includes. FWIW, I also tend to sort the forward declarations but that's also not something that everyone does. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23388#discussion_r1942464777 From azafari at openjdk.org Wed Feb 5 09:45:20 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 5 Feb 2025 09:45:20 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v20] In-Reply-To: <0I2mUN4F0QBOYVFxIX7UE1aYwJIEUNAOucOjCuxmS58=.091c86ae-399c-4786-b2e9-20593a4e4425@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <0I2mUN4F0QBOYVFxIX7UE1aYwJIEUNAOucOjCuxmS58=.091c86ae-399c-4786-b2e9-20593a4e4425@github.com> Message-ID: On Tue, 4 Feb 2025 13:43:46 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fix in shendoahCardTable > > test/micro/org/openjdk/bench/vm/compiler/FluidSBBench.java line 1: > >> 1: /* > > This must be the result of an incorrect merge conflict resolution. It should be deleted. The file is deleted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1942544444 From azafari at openjdk.org Wed Feb 5 09:53:25 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 5 Feb 2025 09:53:25 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v20] In-Reply-To: <0I2mUN4F0QBOYVFxIX7UE1aYwJIEUNAOucOjCuxmS58=.091c86ae-399c-4786-b2e9-20593a4e4425@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <0I2mUN4F0QBOYVFxIX7UE1aYwJIEUNAOucOjCuxmS58=.091c86ae-399c-4786-b2e9-20593a4e4425@github.com> Message-ID: On Tue, 4 Feb 2025 13:48:58 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fix in shendoahCardTable > > test/hotspot/gtest/nmt/test_vmatree.cpp line 831: > >> 829: tty->cr(); >> 830: } >> 831: } > > This looks like a test which was added when you merged in an earlier version of https://github.com/openjdk/jdk/pull/20994 > > This should be removed. Done. > test/hotspot/jtreg/compiler/stringopts/TestFluidAndNonFluid.java line 1: > >> 1: /* > > Incorrect merge resolution. Removed the file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1942555378 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1942556962 From adinn at openjdk.org Wed Feb 5 10:35:10 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 5 Feb 2025 10:35:10 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v2] In-Reply-To: References: <7UgNYEuTu6rj7queOgM9xIy-6kQMdACrZiDLtlniMYw=.dff6f18b-1236-43b1-8280-2bce9160f32a@github.com> Message-ID: On Tue, 4 Feb 2025 18:57:28 GMT, Ferenc Rakoczi wrote: >>> @ferakocz I'm afraid you lucked out on getting your change committed before my reorganization of the stub generation code. If you are unsure of how to do the merge so your new stub is declared and generated following the new model (see the doc comments in stubDeclarations.hpp for details) let me know and I'll be happy to help you sort it out. >> >> @adinn I think I managed to figure it out. Please take a look at the PR and let me know if I should have done anything differently. > >> @ferakocz Yes, the stub declaration part of it looks to be correct. >> >> The rest of the patch will need at least two reviewers (@theRealAph? @martinuy? @franferrax) and may take some time to review, given that they will probably need to read up on the maths and algorithms. As an aid for reviewers and maintainers it would be good to insert a comment into the generator file linking the implementations to the relevant maths and algorithm. I found the FIPS-204 spec and the CRYSTALS-Dilithium Algorithm Speci?cations and Supporting Documentation paper, Shi Bai, L?o Ducas et al, 2021 - are they the best ones to look at? > > The Java implementation of ML-DSA is based on the FIPS-204 standard and the intrinsicss' implementations are based on the corresponding Java methods, except that the montMul() calls in them are inlined. The rest of the transformation from Java code to intrinsic code is pretty straightforward, so a reviewer need not necessarily understand the whole mathematics of the ML-DSA algorithms, just that the Java and the corresponding intrinsic code do the same thing. @ferakocz > The Java implementation of ML-DSA is based on the FIPS-204 standard and the intrinsics' implementations are based on the corresponding Java methods, except that the montMul() calls in them are inlined. The rest of the transformation from Java code to intrinsic code is pretty straightforward, so a reviewer need not necessarily understand the whole mathematics of the ML-DSA algorithms, just that the Java and the corresponding intrinsic code do the same thing. Yes, I located the relevant Java implementations in SHA3.java (keccak) and ML_DSA.java (dilithiumXXX) plus also SHA3Parallel.java (doubleKeccak). The first file does at least mention FIPS-202. The second does not include any reference, in particular does not mention FIPS-204. I still think it would be helpful for reviewers and maintainers if you were to add a comment in front of the generator routines that 1) notes that these routines are based on the relevant Java sources and 2) mentions that the Java code is in turn based on the FIPS-202 and FIPS-204 standards. While I agree that a reviewer or maintainer could simply check the generated code against the Java code I believe access to the underlying theory will be of aid when it comes to understanding what each variant is doing and verifying the equivalence of the two. That's why I'd also prefer to have two reviews to be sure that more than one of us who may be tasked with maintaining this code can be happy that we understand, at least, the equivalence in question. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2636346476 From azafari at openjdk.org Wed Feb 5 10:38:21 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 5 Feb 2025 10:38:21 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v20] In-Reply-To: <0I2mUN4F0QBOYVFxIX7UE1aYwJIEUNAOucOjCuxmS58=.091c86ae-399c-4786-b2e9-20593a4e4425@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <0I2mUN4F0QBOYVFxIX7UE1aYwJIEUNAOucOjCuxmS58=.091c86ae-399c-4786-b2e9-20593a4e4425@github.com> Message-ID: On Tue, 4 Feb 2025 13:44:52 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fix in shendoahCardTable > > test/hotspot/jtreg/runtime/Thread/TestAlwaysPreTouchStacks.java line 182: > >> 180: throw new RuntimeException("Expected a higher delta between stack committed of with and without pretouch." + >> 181: "Expected: " + expected_delta + " Actual: " + actual_delta); >> 182: } > > Is this meant to be part of this PR? Better and correct if not to. Changes are removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1942629076 From stuefe at openjdk.org Wed Feb 5 12:07:11 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 5 Feb 2025 12:07:11 GMT Subject: RFR: 8323158: HotSpot Style Guide should specify more include ordering [v2] In-Reply-To: References: <2rPnjlXTdPQIsaQNDq7sA69p8YySeLVN896QgU9ABWY=.e2ac8773-7e8d-45bc-85ef-c8bc7d878ae7@github.com> Message-ID: On Wed, 5 Feb 2025 08:50:29 GMT, Stefan Karlsson wrote: > FWIW, I also tend to sort the forward declarations but that's also not something that everyone does. I, too, am an obsessive sorter. Clean code is good code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23388#discussion_r1942751804 From jsjolen at openjdk.org Wed Feb 5 12:42:13 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 5 Feb 2025 12:42:13 GMT Subject: RFR: 8323158: HotSpot Style Guide should specify more include ordering In-Reply-To: <1gOEDAkU1eGcUnYqPawR3g5OqAseBiVIVPUIi4O9GYc=.321e1b86-3c6f-4c92-8036-27240a778659@github.com> References: <9RwWVrZpTqsRO5srrrT0jOt4CGc7oF5FEm06Pzjf2yI=.a5fc3070-226c-4292-9802-426c9cab1672@github.com> <1gOEDAkU1eGcUnYqPawR3g5OqAseBiVIVPUIi4O9GYc=.321e1b86-3c6f-4c92-8036-27240a778659@github.com> Message-ID: On Mon, 3 Feb 2025 12:14:53 GMT, Doug Simon wrote: >>> A lot of these rules looks like they could be checked with some simple scripting or additions to jcheck. Have you considered that? >> >> I haven't felt the urge to write such a script, but I know that others have scripts to sort the includes. > >> I haven't felt the urge to write such a script, but I know that others have scripts to sort the includes > > Ok, it was just a suggestion. My experience is that while clearly written conventions/rules are important, the more they can be automated, the less hassle for everyone. @dougxc, We could use clang-format for these specific include rules. For example, when we still had `precompiled.hpp` I had this in my `.clang-format` file: IncludeCategories: # precompiled.hpp ALWAYS first - Regex: 'precompiled.hpp' Priority: 0 SortPriority: 0 Implementing all of the rules for Clang format is left as an exercise to the reader, and the author would be very appreciative if someone posted their solution here :-). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23388#issuecomment-2636674949 From cnorrbin at openjdk.org Wed Feb 5 14:28:06 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 5 Feb 2025 14:28:06 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v2] In-Reply-To: References: Message-ID: <5RUqIzL1mnILs0gYOsgmW6ibVa0dAKiBRPu-EzP7VUo=.3b519fe4-c7c2-4a97-a8d3-b426628883a6@github.com> > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request incrementally with two additional commits since the last revision: - reduced diff - 0-sized value ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23416/files - new: https://git.openjdk.org/jdk/pull/23416/files/6d92ab6c..633d2a2f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=00-01 Stats: 383 lines in 3 files changed: 199 ins; 161 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From cnorrbin at openjdk.org Wed Feb 5 14:35:13 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 5 Feb 2025 14:35:13 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v2] In-Reply-To: References: Message-ID: <0pkHVYU3dzzQlTvzdTT9rgRpzgmcSg1mzeCK1PvFnsA=.e452fcdd-27f2-4f12-8946-e43f08527866@github.com> On Mon, 3 Feb 2025 19:39:36 GMT, Johan Sj?len wrote: >> Casper Norrbin has updated the pull request incrementally with two additional commits since the last revision: >> >> - reduced diff >> - 0-sized value > > src/hotspot/share/utilities/rbTree.hpp line 71: > >> 69: const K& key() const { return _key; } >> 70: V& val() { return _value; } >> 71: V& val() const { return _value; } > > Hmm, this doesn't seem quite right. Why can't we have a `const` method returning a `const` value anymore? Oops, fixed now :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1943052390 From cnorrbin at openjdk.org Wed Feb 5 14:35:12 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 5 Feb 2025 14:35:12 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 07:24:06 GMT, Thomas Stuefe wrote: > Why not skip the useless insert_node creation? Let insert_at_cursor just accept uninitialized Node* memory, since it will initialize all Node members anyway? > As of now, the cursor doesn't track to what key its pointing to. All `Node` members _but_ the key are initialized, so the key still needs to be specified. Thats what the `0` in `Node(0)` is. I could change this to have that also stored inside the cursor, so we can avoid creating the node first. > could you massage this patch a bit to reduce the delta to the last version? That is a good idea in general (I usually do a manual minimize-delta sweep before I undraft a PR for review). A lot seems to be code movement at first glance. > A lot is new or rewritten so git diff had a hard time picking up related changes. I tried my best to order and group the changes so functionality overlaps. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23416#issuecomment-2637014828 From ayang at openjdk.org Wed Feb 5 14:41:39 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 5 Feb 2025 14:41:39 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: > Here is an attempt to simplify GCLocker implementation for Serial and Parallel. > > GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. > > The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. > > Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into gclocker - review - Merge branch 'master' into gclocker - gclocker ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23367/files - new: https://git.openjdk.org/jdk/pull/23367/files/6283a19c..1b6f908b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23367&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23367&range=00-01 Stats: 20456 lines in 569 files changed: 9369 ins; 6708 del; 4379 mod Patch: https://git.openjdk.org/jdk/pull/23367.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23367/head:pull/23367 PR: https://git.openjdk.org/jdk/pull/23367 From ayang at openjdk.org Wed Feb 5 14:41:39 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 5 Feb 2025 14:41:39 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 4 Feb 2025 09:05:35 GMT, Thomas Schatzl wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into gclocker >> - review >> - Merge branch 'master' into gclocker >> - gclocker > > src/hotspot/share/gc/shared/gcLocker.hpp line 33: > >> 31: >> 32: class GCLocker: public AllStatic { >> 33: static Monitor* _lock; > > Not sure if having this copy/reference to `Heap_lock` makes the code more clear than referencing `Heap_lock` directly. It needs to be `Heap_lock` anyway. `GCLocker` itself doesn't mandates that the lock must be `Heap_lock`; it's the interaction with rest of VM that shows that `Heap_lock` is a good candidate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1943040719 From ayang at openjdk.org Wed Feb 5 14:41:39 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 5 Feb 2025 14:41:39 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: <82w1_VjrsxtrpA7921QmHsA0kh9_J0kBtOCxp6sL7F4=.0b0d0698-b3d2-43a0-85b4-6b7e530e3a7a@github.com> On Wed, 5 Feb 2025 14:38:45 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker > Further it would be interesting to see how many retries there are in the allocation loop with these jnilock* stress test. I added `QueuedAllocationWarningCount=1` to `test/hotspot/jtreg/vmTestbase/nsk/stress/jni/gclocker/gcl001.java` and saw retry never exceeds 10 for Serial/Parallel. > Which means that it might retry for a long time That occurs only when another java thread successfully triggers a gc, advancing the gc-counter, i.e. there is some system-wide progress. Per-thread progress is hard to guarantee, IMO. ------------- PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2595944041 From mdoerr at openjdk.org Wed Feb 5 14:46:21 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 5 Feb 2025 14:46:21 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v19] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Wed, 5 Feb 2025 08:38:58 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > adapt Condition registers Changes requested by mdoerr (Reviewer). src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 626: > 624: #ifdef ASSERT > 625: __ cmpwi(CR0, blocks, 0); > 626: __ beq(CR0, L_error); I suggest using `asm_assert_eq` which is more simple. You can get rid of the Label and the extra code at the end, too. src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 655: > 653: // https://web.archive.org/web/20110609115824/https://software.intel.com/file/24918 > 654: // > 655: Label loop; Please try if aligning the loop entry improves performance. I'd insert `__ align(32);` here. src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 5085: > 5083: StubRoutines::_data_cache_writeback = generate_data_cache_writeback(); > 5084: StubRoutines::_data_cache_writeback_sync = generate_data_cache_writeback_sync(); > 5085: } Please add an empty line. src/hotspot/cpu/ppc/vm_version_ppc.cpp line 284: > 282: // The AES intrinsic stubs require AES instruction support. > 283: if (has_vcipher()) { > 284: if (FLAG_IS_DEFAULT(UseAES)) { Please revert this change. ------------- PR Review: https://git.openjdk.org/jdk/pull/20235#pullrequestreview-2595990712 PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1943067976 PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1943070142 PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1943071278 PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1943071762 From cnorrbin at openjdk.org Wed Feb 5 15:06:40 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 5 Feb 2025 15:06:40 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v3] In-Reply-To: References: Message-ID: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: build fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23416/files - new: https://git.openjdk.org/jdk/pull/23416/files/633d2a2f..321d6225 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=01-02 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From cnorrbin at openjdk.org Wed Feb 5 15:06:40 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 5 Feb 2025 15:06:40 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v3] In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 19:38:28 GMT, Johan Sj?len wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> build fix > > src/hotspot/share/utilities/rbTree.hpp line 33: > >> 31: #include >> 32: >> 33: struct Empty {}; > > This will export the name `Empty` to everyone, is it possible to move it to inside of the class? `Empty` was used as the value type for the intrusive tree, but I discovered that it didn't quite work as expected, because `sizeof(Empty) == 1` due to it requiring a unique address. This means that we would waste space in a lot of cases. For example, 8-byte keys would lead to 40-byte `RBNode`s despite storing only three 8-byte pointers and one 8-byte key. I explored an alternative approach by adding the option to use void as the value type instead, and removing the need for a value member altogether. By using a base class containing only the value and using conditional inheritance from either it or `Empty` (which benefits from [empty base optimization](https://en.cppreference.com/w/cpp/language/ebo)), we can have a zero-size overhead. Code-wise, this solution doesn't look as clean. We need added templating and `ENABLE_IF`s for functions with value references (since `void&` doesn't work). Another positive however is that this enables key-only red-black trees for scenarios where values aren't necessary. Let me know what you think :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1943110358 From cnorrbin at openjdk.org Wed Feb 5 15:49:48 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 5 Feb 2025 15:49:48 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v4] In-Reply-To: References: Message-ID: <81hkwax0ggd4Guleb-PV72ALuF1YU58kF7qnLfrlr5I=.b3f08801-e2ec-48e1-93af-38262f003bfb@github.com> > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: windows build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23416/files - new: https://git.openjdk.org/jdk/pull/23416/files/321d6225..ddf9de0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From nbenalla at openjdk.org Wed Feb 5 17:28:34 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Wed, 5 Feb 2025 17:28:34 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding Message-ID: Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. It scans files in those directories, filtering out certain files as well as all `.c`, `.java` and `.jar` files in test sources. Before adding line 86 and excluding `os_windows.cpp`, the test failed with: Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: HMODULE hModule = NULL; Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. at TestNoNULL.main(TestNoNULL.java:73) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) at java.base/java.lang.Thread.run(Thread.java:1447) ------------- Commit messages: - Add a test to prevent NULL backsliding Changes: https://git.openjdk.org/jdk/pull/23466/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23466&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343802 Stats: 151 lines in 2 files changed: 150 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23466.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23466/head:pull/23466 PR: https://git.openjdk.org/jdk/pull/23466 From jwaters at openjdk.org Wed Feb 5 17:35:36 2025 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 5 Feb 2025 17:35:36 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 15:39:32 GMT, Nizar Benalla wrote: > Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. > It scans files in those directories, filtering out certain files as well as all `.c`, `.java` and `.jar` files in test sources. > > Before adding line 86 and excluding `os_windows.cpp`, the test failed with: > > > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: > HMODULE hModule = NULL; > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: > GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); > java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. > at TestNoNULL.main(TestNoNULL.java:73) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > at java.base/java.lang.Thread.run(Thread.java:1447) Seems like this solution would require someone working on the JDK to run the proper test, something which not everyone is guaranteed to do when working on HotSpot. Is there a way to make this part of the build process so trying to compile HotSpot will fail if there are NULLs in the source code? EDIT: Removed the quoting of the original Pull Request body ------------- PR Comment: https://git.openjdk.org/jdk/pull/23466#issuecomment-2637585100 From coleenp at openjdk.org Wed Feb 5 17:57:29 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 5 Feb 2025 17:57:29 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v5] In-Reply-To: References: Message-ID: > This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. > Tested with tier1-4. Coleen Phillimore has updated the pull request incrementally with three additional commits since the last revision: - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> - Remove @Stable annotation for final field. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23396/files - new: https://git.openjdk.org/jdk/pull/23396/files/d04b808f..6bb7fe6e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23396&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23396&range=03-04 Stats: 9 lines in 2 files changed: 1 ins; 5 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23396.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23396/head:pull/23396 PR: https://git.openjdk.org/jdk/pull/23396 From coleenp at openjdk.org Wed Feb 5 17:57:31 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 5 Feb 2025 17:57:31 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v4] In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 05:35:58 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test that knows which fields are hidden from reflection in jvmci. > > src/java.base/share/classes/java/lang/Class.java line 239: > >> 237: * generated. >> 238: */ >> 239: private Class(ClassLoader loader, Class arrayComponentType, ProtectionDomain pd) { > > If this constructor is not used then why do we need to add the PD argument, rather than just set it to null? For that matter why do we even need the field if nothing is ever setting it? I'm missing something here. @DanHeidinga suggested this for my other PR as a convention that's used for the j.l.Class constructor. > src/java.base/share/classes/java/lang/Class.java line 2701: > >> 2699: >> 2700: @Stable >> 2701: private transient final ProtectionDomain protectionDomain; > > Isn't `@Stable` superfluous with a final field? Yes, I thought I removed it but that was probably the other PR. > src/java.base/share/classes/java/lang/Class.java line 2722: > >> 2720: */ >> 2721: public ProtectionDomain getProtectionDomain() { >> 2722: if (protectionDomain == null) { > > Does this imply the class is a primitive class? No I believe classes that are bootstrap classes don't generally have a protection domain. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1943394960 PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1943393526 PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1943394193 From liach at openjdk.org Wed Feb 5 18:22:11 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 5 Feb 2025 18:22:11 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v5] In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 17:57:29 GMT, Coleen Phillimore wrote: >> This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with three additional commits since the last revision: > > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Remove @Stable annotation for final field. The core library changes look good. Need some other reviewer to verify the VM changes. ------------- Marked as reviewed by liach (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23396#pullrequestreview-2596620084 From coleenp at openjdk.org Wed Feb 5 19:06:17 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 5 Feb 2025 19:06:17 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v5] In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 17:57:29 GMT, Coleen Phillimore wrote: >> This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with three additional commits since the last revision: > > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Remove @Stable annotation for final field. Thanks Chen for the review. Hotspot generally also requires two reviewers, so I await David and other's review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23396#issuecomment-2637788306 From kbarrett at openjdk.org Wed Feb 5 19:09:14 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 5 Feb 2025 19:09:14 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 15:39:32 GMT, Nizar Benalla wrote: > Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. > It scans files in those directories, filtering out certain files as well as all `.c`, `.java` and `.jar` files in test sources. > > Before adding line 86 and excluding `os_windows.cpp`, the test failed with: > > > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: > HMODULE hModule = NULL; > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: > GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); > java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. > at TestNoNULL.main(TestNoNULL.java:73) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > at java.base/java.lang.Thread.run(Thread.java:1447) Changes requested by kbarrett (Reviewer). test/hotspot/jtreg/TEST.groups line 47: > 45: gc > 46: > 47: hotspot_null_check = \ Where is this used? test/hotspot/jtreg/TEST.groups line 228: > 226: -compiler/loopopts/Test7052494.java \ > 227: -compiler/runtime/Test6826736.java \ > 228: sources This seems weirdly placed. And I think we want the NULL check to be done in tier1. test/hotspot/jtreg/sources/TestNoNULL.java line 47: > 45: private static final Set excludedTestFiles = new HashSet<>(); > 46: private static final Set excludedTestExtensions = Set.of(".c", ".java", ".jar"); > 47: private static final Pattern NULL_PATTERN = Pattern.compile("\\bNULL\\b"); I don't think this is the right pattern to use. See the description in JDK-8343802 for the pattern I think should be used. test/hotspot/jtreg/sources/TestNoNULL.java line 85: > 83: "src/hotspot/share/prims/jvmti.xsl", > 84: "src/hotspot/share/utilities/globalDefinitions_visCPP.hpp", > 85: "src/hotspot/share/utilities/globalDefinitions_gcc.hpp", globalDefinitions_gcc.hpp no longer needs to be excluded, since JDK-8343800 has been fixed. test/hotspot/jtreg/sources/TestNoNULL.java line 103: > 101: public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) { > 102: if (isIncluded(file, excludedFiles, excludeExtensions)) { > 103: files.add(file); Why collect files to be checked, rather than just checking them here? test/hotspot/jtreg/sources/TestNoNULL.java line 135: > 133: private static boolean checkForNull(Path path) throws IOException { > 134: boolean found = false; > 135: List lines = Files.readAllLines(path, StandardCharsets.UTF_8); I would have thought it would be better to read and check a line at a time. Though it probably doesn't really matter all that much. ------------- PR Review: https://git.openjdk.org/jdk/pull/23466#pullrequestreview-2596672636 PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1943484063 PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1943485458 PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1943489367 PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1943482428 PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1943522447 PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1943492846 From nbenalla at openjdk.org Wed Feb 5 19:19:12 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Wed, 5 Feb 2025 19:19:12 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding In-Reply-To: References: Message-ID: <4MSlN3i1LOZYEAS7E5ld_aEaRyeRFFnYzVydneu6RFk=.562f090d-cd7c-44af-b673-9de2c4cd362f@github.com> On Wed, 5 Feb 2025 17:33:21 GMT, Julian Waters wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java` and `.jar` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Seems like this solution would require someone working on the JDK to run the proper test, something which not everyone is guaranteed to do when working on HotSpot. Is there a way to make this part of the build process so trying to compile HotSpot will fail if there are NULLs in the source code? > > EDIT: Removed the quoting of the original Pull Request body @TheShermanTanker I ran GHA in a separate branch and intentionally [triggered a failure](https://productionresultssa0.blob.core.windows.net/actions-results/a48b32b8-8741-4c53-bd4d-791ea6525f8c/workflow-job-run-31d5d6cf-dbfa-5459-b435-eff5465595d4/logs/job/job-logs.txt?rsct=text%2Fplain&se=2025-02-05T19%3A23%3A48Z&sig=BDGTWHIHY%2BTGhMPBLnMGqoKOSfZmRQidkoZgQ%2BTwjKA%3D&ske=2025-02-06T04%3A21%3A13Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2025-02-05T16%3A21%3A13Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2025-01-05&sp=r&spr=https&sr=b&st=2025-02-05T19%3A13%3A43Z&sv=2025-01-05). Use Control Find and look for `Error: 'NULL' found in`. GHA should fail and alert developers that there is an issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23466#issuecomment-2637819950 From coleenp at openjdk.org Wed Feb 5 19:51:06 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 5 Feb 2025 19:51:06 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: <0ZM_vg_dAmbdbeoIeZ8ylBUDj_4_jxM-aE6IKoH6ykM=.69c7554f-5e2b-40b9-8d1a-abe147548dbb@github.com> On Wed, 5 Feb 2025 01:10:39 GMT, Dean Long wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix copyright and param name > > test/micro/org/openjdk/bench/java/lang/reflect/Clazz.java line 73: > >> 71: public int getAppArrayModifiers() { >> 72: return clazzArray.getClass().getModifiers(); >> 73: } > > I'm guessing this is the benchmark that shows an extra load. How about adding a benchmark that makes the Clazz[] final or @Stable, and see if that makes the extra load go away? Name Cnt Base Error Test Error Unit Change getAppArrayModifiers 30 0.923 ? 0.004 1.260 ? 0.001 ns/op 0.73x (p = 0.000*) getAppArrayModifiersFinal 30 0.922 ? 0.000 1.260 ? 0.001 ns/op 0.73x (p = 0.000*) No it doesn't really help. There's still an extra load. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1943569183 From dlong at openjdk.org Wed Feb 5 19:51:02 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 5 Feb 2025 19:51:02 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Wed, 5 Feb 2025 14:41:39 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker I like the direction this is taking us, but I think we could go even further and eventually fold the JNI critical region into the existing safepoint mechanism. To me, the safepoint mechanism already implements a readers-writer lock, with threads states like _thread_in_Java/_thread_in_vm already being "critical regions". With this change, we have two nested readers-writer locks that a GC needs to acquire. I think if we made entering and exiting a JNI critical region change the thread state, (probably by introducing a new thread state), then we don't need a separate readers-writer lock for JNI critical region. However, maybe we don't want to go that far, as the current solution allows us GC-specific implementations and allows each different GC VMOp to decide if it needs to block for JNI critical regions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2637865869 From egahlin at openjdk.org Wed Feb 5 19:59:13 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 5 Feb 2025 19:59:13 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Wed, 5 Feb 2025 14:41:39 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker JFR changes look good. ------------- Marked as reviewed by egahlin (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2596847997 From nbenalla at openjdk.org Wed Feb 5 20:16:57 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Wed, 5 Feb 2025 20:16:57 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v2] In-Reply-To: References: Message-ID: <5SUTxzDb_jOFp4iB1-utmXIu-osA0-r5LaYwixoL_qk=.ee94c3d6-7ba2-4012-ab1b-3d6a0113d1ed@github.com> > Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. > It scans files in those directories, filtering out certain files as well as all `.c`, `.java` and `.jar` files in test sources. > > Before adding line 86 and excluding `os_windows.cpp`, the test failed with: > > > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: > HMODULE hModule = NULL; > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: > GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); > java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. > at TestNoNULL.main(TestNoNULL.java:73) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > at java.base/java.lang.Thread.run(Thread.java:1447) Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: update based on feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23466/files - new: https://git.openjdk.org/jdk/pull/23466/files/25b8610d..ae3c9eba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23466&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23466&range=00-01 Stats: 34 lines in 2 files changed: 7 ins; 15 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/23466.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23466/head:pull/23466 PR: https://git.openjdk.org/jdk/pull/23466 From nbenalla at openjdk.org Wed Feb 5 20:16:58 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Wed, 5 Feb 2025 20:16:58 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v2] In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 18:42:51 GMT, Kim Barrett wrote: >> Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: >> >> update based on feedback > > test/hotspot/jtreg/TEST.groups line 47: > >> 45: gc >> 46: >> 47: hotspot_null_check = \ > > Where is this used? It was unused, I've fixed it. > test/hotspot/jtreg/sources/TestNoNULL.java line 47: > >> 45: private static final Set excludedTestFiles = new HashSet<>(); >> 46: private static final Set excludedTestExtensions = Set.of(".c", ".java", ".jar"); >> 47: private static final Pattern NULL_PATTERN = Pattern.compile("\\bNULL\\b"); > > I don't think this is the right pattern to use. See the description in JDK-8343802 for the pattern I think > should be used. I remember running into some false positives when using it, but I don't see them anymore so it may have been an issue on my end. Fixed in [ae3c9eb](https://github.com/openjdk/jdk/pull/23466/commits/ae3c9ebaef6fbbacdebbc1687bd41e1f3ac07df7) > test/hotspot/jtreg/sources/TestNoNULL.java line 103: > >> 101: public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) { >> 102: if (isIncluded(file, excludedFiles, excludeExtensions)) { >> 103: files.add(file); > > Why collect files to be checked, rather than just checking them here? Simply a matter of preference, I prefer to split programs into smaller bits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1943605306 PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1943604757 PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1943605848 From dlong at openjdk.org Wed Feb 5 20:26:16 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 5 Feb 2025 20:26:16 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: <0ZM_vg_dAmbdbeoIeZ8ylBUDj_4_jxM-aE6IKoH6ykM=.69c7554f-5e2b-40b9-8d1a-abe147548dbb@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> <0ZM_vg_dAmbdbeoIeZ8ylBUDj_4_jxM-aE6IKoH6ykM=.69c7554f-5e2b-40b9-8d1a-abe147548dbb@github.com> Message-ID: <0efX7bcHNl5p1RoF3VnqZIabdavsGosuMI14cZPDzbQ=.2bde6bbf-a59b-4f5b-9c68-7a8a258b2ee5@github.com> On Wed, 5 Feb 2025 19:42:02 GMT, Coleen Phillimore wrote: >> test/micro/org/openjdk/bench/java/lang/reflect/Clazz.java line 73: >> >>> 71: public int getAppArrayModifiers() { >>> 72: return clazzArray.getClass().getModifiers(); >>> 73: } >> >> I'm guessing this is the benchmark that shows an extra load. How about adding a benchmark that makes the Clazz[] final or @Stable, and see if that makes the extra load go away? > > Name Cnt Base Error Test Error Unit Change > getAppArrayModifiers 30 0.923 ? 0.004 1.260 ? 0.001 ns/op 0.73x (p = 0.000*) > getAppArrayModifiersFinal 30 0.922 ? 0.000 1.260 ? 0.001 ns/op 0.73x (p = 0.000*) > > No it doesn't really help. There's still an extra load. OK, if the extra load turns out to be a problem in the future, we could look into why the compilers are generating the load when the Class is known/constant. If the old intrinsic was able to pull the constant out of the Klass, then surely we can do the same and pull the value from the Class field. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1943616021 From dlong at openjdk.org Wed Feb 5 21:29:12 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 5 Feb 2025 21:29:12 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: <1yPHOj_hANp7ZvMfmgi6lRkpokgNNaUSc09FJfZvWk8=.bfcf2780-4afe-4253-ae0b-e3bc6ab7ee86@github.com> On Tue, 4 Feb 2025 14:43:51 GMT, Coleen Phillimore wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright and param name src/hotspot/share/compiler/compileLog.cpp line 116: > 114: print(" unloaded='1'"); > 115: } else { > 116: print(" flags='%d'", klass->access_flags()); There may be tools that parse the log file and get confused by this change. Maybe we should also change the label from "flags" to "access flags". src/hotspot/share/jfr/recorder/checkpoint/types/jfrTypeSet.cpp line 350: > 348: writer->write(mark_symbol(klass, leakp)); > 349: writer->write(package_id(klass, leakp)); > 350: writer->write(klass->compute_modifier_flags()); Isn't this much more expensive than grabbing the value from the mirror, especially if we have to iterate over inner classes? src/hotspot/share/oops/instanceKlass.hpp line 1128: > 1126: #endif > 1127: > 1128: int compute_modifier_flags() const; I don't see why this can't stay u2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1943680670 PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1943679056 PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1943682936 From dlong at openjdk.org Wed Feb 5 21:43:14 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 5 Feb 2025 21:43:14 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Tue, 4 Feb 2025 14:43:51 GMT, Coleen Phillimore wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright and param name src/hotspot/share/opto/memnode.cpp line 2458: > 2456: return TypePtr::NULL_PTR; > 2457: } > 2458: // ??? I suspect that we still need this code to support intrinsics like LibraryCallKit::inline_native_classID() and maybe other users of this field, but the comment below no longer makes sense. src/hotspot/share/opto/memnode.cpp line 2459: > 2457: } > 2458: // ??? > 2459: // (Folds up the 1st indirection in aClassConstant.getModifiers().) Suggestion: // Fold up the load of the hidden field ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1943695585 PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1943696867 From dlong at openjdk.org Wed Feb 5 21:47:12 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 5 Feb 2025 21:47:12 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: <8Wx3xbbOnPXS5n1RuNaesqHbhKV3iLwrCVF0s6uWOrA=.cb20728e-e13c-4667-822b-3ba424cbc12f@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> <8Wx3xbbOnPXS5n1RuNaesqHbhKV3iLwrCVF0s6uWOrA=.cb20728e-e13c-4667-822b-3ba424cbc12f@github.com> Message-ID: On Thu, 12 Dec 2024 10:16:01 GMT, Viktor Klang wrote: >> @viktorklang-ora `@Stable` is not about how the field was set, but about the JIT observing a non-default value at compile time. If it observes a non-default value, it can treat it as a compile time constant. > > @DanHeidinga Great explanation, thank you! If Class had other fields smaller than `int`, would be consider making this something like `char` to save space (allowing all the sub-word fields to be compacted)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1943701237 From dlong at openjdk.org Wed Feb 5 21:53:11 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 5 Feb 2025 21:53:11 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Tue, 4 Feb 2025 14:43:51 GMT, Coleen Phillimore wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright and param name Overall looks good to me. Please ask @iwanowww to review compiler changes. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22652#pullrequestreview-2597046622 From kbarrett at openjdk.org Wed Feb 5 22:47:10 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 5 Feb 2025 22:47:10 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v2] In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 20:12:40 GMT, Nizar Benalla wrote: >> test/hotspot/jtreg/sources/TestNoNULL.java line 47: >> >>> 45: private static final Set excludedTestFiles = new HashSet<>(); >>> 46: private static final Set excludedTestExtensions = Set.of(".c", ".java", ".jar"); >>> 47: private static final Pattern NULL_PATTERN = Pattern.compile("\\bNULL\\b"); >> >> I don't think this is the right pattern to use. See the description in JDK-8343802 for the pattern I think >> should be used. > > I remember running into some false positives when using it, but I don't see them anymore so it may have been an issue on my end. > > Fixed in [ae3c9eb](https://github.com/openjdk/jdk/pull/23466/commits/ae3c9ebaef6fbbacdebbc1687bd41e1f3ac07df7) Maybe this is okay. I wasn't familiar with the `\b` command, and after looking it up, it might be sufficient here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1943777654 From liach at openjdk.org Thu Feb 6 04:40:11 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 6 Feb 2025 04:40:11 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: <0efX7bcHNl5p1RoF3VnqZIabdavsGosuMI14cZPDzbQ=.2bde6bbf-a59b-4f5b-9c68-7a8a258b2ee5@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> <0ZM_vg_dAmbdbeoIeZ8ylBUDj_4_jxM-aE6IKoH6ykM=.69c7554f-5e2b-40b9-8d1a-abe147548dbb@github.com> <0efX7bcHNl5p1RoF3VnqZIabdavsGosuMI14cZPDzbQ=.2bde6bbf-a59b-4f5b-9c68-7a8a258b2ee5@github.com> Message-ID: <7KdNVSXLx0N027uyQgtUuN82VpXTlyPpPOnBv3sqYRs=.6b549b56-36f9-4ab3-8469-4779d93dd1e7@github.com> On Wed, 5 Feb 2025 20:23:05 GMT, Dean Long wrote: >> Name Cnt Base Error Test Error Unit Change >> getAppArrayModifiers 30 0.923 ? 0.004 1.260 ? 0.001 ns/op 0.73x (p = 0.000*) >> getAppArrayModifiersFinal 30 0.922 ? 0.000 1.260 ? 0.001 ns/op 0.73x (p = 0.000*) >> >> No it doesn't really help. There's still an extra load. > > OK, if the extra load turns out to be a problem in the future, we could look into why the compilers are generating the load when the Class is known/constant. If the old intrinsic was able to pull the constant out of the Klass, then surely we can do the same and pull the value from the Class field. Does `static final` help here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1944083490 From liach at openjdk.org Thu Feb 6 05:08:10 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 6 Feb 2025 05:08:10 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v2] In-Reply-To: References: Message-ID: <1FHWFKbEJWpXpCj6stQj9k0HIfVcFSkqOcXGEYvBXDo=.08dcf31a-4468-44f1-b5aa-80bd5fe79a14@github.com> On Wed, 5 Feb 2025 19:05:24 GMT, Kim Barrett wrote: >> Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: >> >> update based on feedback > > test/hotspot/jtreg/sources/TestNoNULL.java line 103: > >> 101: public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) { >> 102: if (isIncluded(file, excludedFiles, excludeExtensions)) { >> 103: files.add(file); > > Why collect files to be checked, rather than just checking them here? I agree with @kimbarrett; there is no point creating buffer collections that consume more resources with no real gains. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1944101310 From liach at openjdk.org Thu Feb 6 05:13:25 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 6 Feb 2025 05:13:25 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v2] In-Reply-To: <5SUTxzDb_jOFp4iB1-utmXIu-osA0-r5LaYwixoL_qk=.ee94c3d6-7ba2-4012-ab1b-3d6a0113d1ed@github.com> References: <5SUTxzDb_jOFp4iB1-utmXIu-osA0-r5LaYwixoL_qk=.ee94c3d6-7ba2-4012-ab1b-3d6a0113d1ed@github.com> Message-ID: On Wed, 5 Feb 2025 20:16:57 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java` and `.jar` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: > > update based on feedback test/hotspot/jtreg/sources/TestNoNULL.java line 56: > 54: } > 55: > 56: if (dir == null) { @sormuras Do you know if the source directory (or directories) of the JDK is passed to jtreg at all? The current approach seems a bit hacky. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1944104148 From dholmes at openjdk.org Thu Feb 6 06:28:13 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Feb 2025 06:28:13 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Wed, 5 Feb 2025 14:41:39 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker src/hotspot/share/runtime/javaThread.hpp line 938: > 936: } > 937: > 938: bool in_critical_atomic() { return Atomic::load(&_jni_active_critical) > 0; } If you think you need an atomic load here, then it would be needed for `in_critical()` so just add it there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1944166423 From dholmes at openjdk.org Thu Feb 6 06:38:13 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Feb 2025 06:38:13 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Wed, 5 Feb 2025 14:41:39 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker > this PR uses an existing thread-local variable with a store-load barrier for synchronization. @albertnetymk can you explain how this protocol is intended to work please. I must be missing some higher-level context that provides additional synchronization because use of the per-thread counters is inherently racy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2638958442 From dholmes at openjdk.org Thu Feb 6 06:41:21 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Feb 2025 06:41:21 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Wed, 5 Feb 2025 19:39:30 GMT, Dean Long wrote: > I think we could go even further and eventually fold the JNI critical region into the existing safepoint mechanism. @dean-long you seem to be forgetting why it was folded out in the first place. :) This was performance critical JNI code where the thread-state transitions were too heavyweight and expensive to use. So we keep the thread safepoint-safe (`_thread_in_native`) and have a way to tell the GC to pause whilst we are in these critical regions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2638962365 From dholmes at openjdk.org Thu Feb 6 07:20:07 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Feb 2025 07:20:07 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v2] In-Reply-To: <5SUTxzDb_jOFp4iB1-utmXIu-osA0-r5LaYwixoL_qk=.ee94c3d6-7ba2-4012-ab1b-3d6a0113d1ed@github.com> References: <5SUTxzDb_jOFp4iB1-utmXIu-osA0-r5LaYwixoL_qk=.ee94c3d6-7ba2-4012-ab1b-3d6a0113d1ed@github.com> Message-ID: On Wed, 5 Feb 2025 20:16:57 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java` and `.jar` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: > > update based on feedback test/hotspot/jtreg/sources/TestNoNULL.java line 78: > 76: "src/hotspot/share/prims/jvmti.xsl", > 77: "src/hotspot/share/utilities/globalDefinitions_visCPP.hpp", > 78: "src/hotspot/os/windows/os_windows.cpp" //TODO: remove after JDK-8349417 This will be fixed before you are ready to integrate. :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1944214960 From dholmes at openjdk.org Thu Feb 6 07:32:12 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Feb 2025 07:32:12 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v4] In-Reply-To: References: Message-ID: <6TuFE5mx8jXm2donAE_cM3I5UXCaB1eKrpCyp7qk0wM=.1c585567-cf27-4d3e-bca9-4aa7a556942c@github.com> On Wed, 5 Feb 2025 17:41:22 GMT, Coleen Phillimore wrote: >> src/java.base/share/classes/java/lang/Class.java line 239: >> >>> 237: * generated. >>> 238: */ >>> 239: private Class(ClassLoader loader, Class arrayComponentType, ProtectionDomain pd) { >> >> If this constructor is not used then why do we need to add the PD argument, rather than just set it to null? For that matter why do we even need the field if nothing is ever setting it? I'm missing something here. > > @DanHeidinga suggested this for my other PR as a convention that's used for the j.l.Class constructor. I am still missing what can actually set a PD here, sorry. ?? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1944230577 From jbechberger at openjdk.org Thu Feb 6 10:20:34 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 6 Feb 2025 10:20:34 GMT Subject: RFR: 8342818: Implement CPU Time Profiling for JFR [v35] In-Reply-To: References: Message-ID: > This is the code for the [JEP draft: CPU Time based profiling for JFR]. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Fix build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20752/files - new: https://git.openjdk.org/jdk/pull/20752/files/d1574a29..0d8dbfee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=33-34 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20752.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20752/head:pull/20752 PR: https://git.openjdk.org/jdk/pull/20752 From jbechberger at openjdk.org Thu Feb 6 11:43:40 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 6 Feb 2025 11:43:40 GMT Subject: RFR: 8342818: Implement CPU Time Profiling for JFR [v36] In-Reply-To: References: Message-ID: > This is the code for the [JEP draft: CPU Time based profiling for JFR]. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Implement NoResourceMark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20752/files - new: https://git.openjdk.org/jdk/pull/20752/files/0d8dbfee..0c6ebbc2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=35 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=34-35 Stats: 63 lines in 6 files changed: 58 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20752.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20752/head:pull/20752 PR: https://git.openjdk.org/jdk/pull/20752 From nbenalla at openjdk.org Thu Feb 6 12:07:15 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Thu, 6 Feb 2025 12:07:15 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v2] In-Reply-To: References: <5SUTxzDb_jOFp4iB1-utmXIu-osA0-r5LaYwixoL_qk=.ee94c3d6-7ba2-4012-ab1b-3d6a0113d1ed@github.com> Message-ID: On Thu, 6 Feb 2025 05:07:24 GMT, Chen Liang wrote: >> Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: >> >> update based on feedback > > test/hotspot/jtreg/sources/TestNoNULL.java line 56: > >> 54: } >> 55: >> 56: if (dir == null) { > > @sormuras Do you know if the source directory (or directories) of the JDK is passed to jtreg at all? The current approach seems a bit hacky. Copy-pasting this comment from the JBS issue > We have the full source available when jtreg tests are run in our internal systems. The same should be true in GHA, as well as when developers run tests locally. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1944606963 From coleenp at openjdk.org Thu Feb 6 12:11:14 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Feb 2025 12:11:14 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> <8Wx3xbbOnPXS5n1RuNaesqHbhKV3iLwrCVF0s6uWOrA=.cb20728e-e13c-4667-822b-3ba424cbc12f@github.com> Message-ID: <4aAX8rSEcvkeYteaJUXHfVEzBbNGwGlhDLIz548dFcs=.616fa7dd-d5bf-42d5-aca0-0bea0b5591d0@github.com> On Wed, 5 Feb 2025 21:44:37 GMT, Dean Long wrote: >> @DanHeidinga Great explanation, thank you! > > If Class had other fields smaller than `int`, would be consider making this something like `char` to save space (allowing all the sub-word fields to be compacted)? I thought of doing this since I made modifiers u2 in the Hotspot code just previously, but all the Java code refers to this as an int. And I didn't see other fields to compact it with. Maybe if access_flags are moved we could make them both char (not short since they're unsigned). It feels weird to not have unsigned short to my C++ eyes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1944613105 From coleenp at openjdk.org Thu Feb 6 12:15:13 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Feb 2025 12:15:13 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v4] In-Reply-To: <6TuFE5mx8jXm2donAE_cM3I5UXCaB1eKrpCyp7qk0wM=.1c585567-cf27-4d3e-bca9-4aa7a556942c@github.com> References: <6TuFE5mx8jXm2donAE_cM3I5UXCaB1eKrpCyp7qk0wM=.1c585567-cf27-4d3e-bca9-4aa7a556942c@github.com> Message-ID: On Thu, 6 Feb 2025 07:29:26 GMT, David Holmes wrote: >> @DanHeidinga suggested this for my other PR as a convention that's used for the j.l.Class constructor. > > I am still missing what can actually set a PD here, sorry. ?? Because the field is final, it has to be initialized in the constructor in Java code. My initial patch for modifiers chose to initialize to zero but that's not quite correct. The constructor cannot be called nor can it be made accessible with setAccessible(). So the constructor for java.lang.Class is essentially the Hotspot code JavaClasses::create_mirror(). This is where the PD is assigned. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1944618468 From stuefe at openjdk.org Thu Feb 6 13:07:42 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Feb 2025 13:07:42 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees Message-ID: For things I currently work on (compilation memory statistic), I need this functionality. Changes: - added leftmost() and rightmost() (pretty self-explanatory) - added print_on(outputStream*) (likewise) - const correctness - other minor cleanups - gtests for all added functions ------------- Commit messages: - fix constness, lp64 issues - RBTree: leftmost, rightmost, print_on Changes: https://git.openjdk.org/jdk/pull/23486/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23486&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349525 Stats: 262 lines in 3 files changed: 187 ins; 9 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/23486.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23486/head:pull/23486 PR: https://git.openjdk.org/jdk/pull/23486 From stuefe at openjdk.org Thu Feb 6 13:07:42 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Feb 2025 13:07:42 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees In-Reply-To: References: Message-ID: <4-wGPRsuA6RZ_iOxLPkL-mbXjfv_gNwzOKp-ZVJ7GLU=.9409fcb5-63f9-42d9-9c6c-9cc8fcfaa4c9@github.com> On Thu, 6 Feb 2025 08:06:04 GMT, Thomas Stuefe wrote: > For things I currently work on (compilation memory statistic), I need this functionality. > > Changes: > > - added leftmost() and rightmost() (pretty self-explanatory) > - added print_on(outputStream*) (likewise) > - const correctness > - other minor cleanups > - gtests for all added functions Ping @caspernorrbin @jdksjolen - I'd appreciate it if we could have this before the currently open intrusive tree rework, since I can then use it for an RFR that is almost finished ------------- PR Comment: https://git.openjdk.org/jdk/pull/23486#issuecomment-2639775689 From jsjolen at openjdk.org Thu Feb 6 13:08:14 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 6 Feb 2025 13:08:14 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v4] In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 15:01:58 GMT, Casper Norrbin wrote: >> src/hotspot/share/utilities/rbTree.hpp line 33: >> >>> 31: #include >>> 32: >>> 33: struct Empty {}; >> >> This will export the name `Empty` to everyone, is it possible to move it to inside of the class? > > `Empty` was used as the value type for the intrusive tree, but I discovered that it didn't quite work as expected, because `sizeof(Empty) == 1` due to it requiring a unique address. This means that we would waste space in a lot of cases. For example, 8-byte keys would lead to 40-byte `RBNode`s despite storing only three 8-byte pointers and one 8-byte key. > > I explored an alternative approach by adding the option to use void as the value type instead, and removing the need for a value member altogether. By using a base class containing only the value and using conditional inheritance from either it or `Empty` (which benefits from [empty base optimization](https://en.cppreference.com/w/cpp/language/ebo)), we can have a zero-size overhead. > > Code-wise, this solution doesn't look as clean. We need added templating and `ENABLE_IF`s for functions with value references (since `void&` doesn't work). Another positive however is that this enables key-only red-black trees for scenarios where values aren't necessary. > > Let me know what you think :) Yeah, it's a bit "ugly", but I'm willing to pay the price for a little bit of complexity in order to have this feature work as expected. Please add "empty base optimization" as a phrase in a comment regarding this, so that people can find out on their own why the code looks as it does. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1944689812 From coleenp at openjdk.org Thu Feb 6 13:13:29 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Feb 2025 13:13:29 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v3] In-Reply-To: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: > The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. > > There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. > > Tested with tier1-8. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/memnode.cpp Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22652/files - new: https://git.openjdk.org/jdk/pull/22652/files/ff693418..f92620eb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22652&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22652&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22652.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22652/head:pull/22652 PR: https://git.openjdk.org/jdk/pull/22652 From stuefe at openjdk.org Thu Feb 6 13:13:11 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Feb 2025 13:13:11 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 08:06:04 GMT, Thomas Stuefe wrote: > For things I currently work on (compilation memory statistic), I need this functionality. > > Changes: > > - added leftmost() and rightmost() (pretty self-explanatory) > - added print_on(outputStream*) (likewise) > - const correctness > - other minor cleanups > - gtests for all added functions > > Tests: GHA (all clean), manual tests on Linux x64 src/hotspot/share/utilities/rbTree.inline.hpp line 484: > 482: template > 483: inline void RBTree::visit_in_order(F f) const { > 484: const RBNode* to_visit[64]; Note to self or others: This needs at least an assertion, better a visit-stop on release builds ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1944698782 From coleenp at openjdk.org Thu Feb 6 13:13:29 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Feb 2025 13:13:29 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Tue, 4 Feb 2025 14:43:51 GMT, Coleen Phillimore wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright and param name Thank you for the detailed comments. ------------- PR Review: https://git.openjdk.org/jdk/pull/22652#pullrequestreview-2598534835 From coleenp at openjdk.org Thu Feb 6 13:13:29 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Feb 2025 13:13:29 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: <1yPHOj_hANp7ZvMfmgi6lRkpokgNNaUSc09FJfZvWk8=.bfcf2780-4afe-4253-ae0b-e3bc6ab7ee86@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> <1yPHOj_hANp7ZvMfmgi6lRkpokgNNaUSc09FJfZvWk8=.bfcf2780-4afe-4253-ae0b-e3bc6ab7ee86@github.com> Message-ID: On Wed, 5 Feb 2025 21:24:25 GMT, Dean Long wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix copyright and param name > > src/hotspot/share/compiler/compileLog.cpp line 116: > >> 114: print(" unloaded='1'"); >> 115: } else { >> 116: print(" flags='%d'", klass->access_flags()); > > There may be tools that parse the log file and get confused by this change. Maybe we should also change the label from "flags" to "access flags". Okay, I wanted to remove the one use of ciKlass::modifier_flags() and the field with this change, but I'll add it back since I added a Klass::modifier_flags() function. > src/hotspot/share/jfr/recorder/checkpoint/types/jfrTypeSet.cpp line 350: > >> 348: writer->write(mark_symbol(klass, leakp)); >> 349: writer->write(package_id(klass, leakp)); >> 350: writer->write(klass->compute_modifier_flags()); > > Isn't this much more expensive than grabbing the value from the mirror, especially if we have to iterate over inner classes? I was trying not to add a Klass::modifier_flags function, but now I have. > src/hotspot/share/opto/memnode.cpp line 2458: > >> 2456: return TypePtr::NULL_PTR; >> 2457: } >> 2458: // ??? > > I suspect that we still need this code to support intrinsics like LibraryCallKit::inline_native_classID() and maybe other users of this field, but the comment below no longer makes sense. Thank you for noticing the ??? that I left in and the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1944651499 PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1944640356 PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1944697467 From coleenp at openjdk.org Thu Feb 6 13:13:30 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Feb 2025 13:13:30 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: <7KdNVSXLx0N027uyQgtUuN82VpXTlyPpPOnBv3sqYRs=.6b549b56-36f9-4ab3-8469-4779d93dd1e7@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> <0ZM_vg_dAmbdbeoIeZ8ylBUDj_4_jxM-aE6IKoH6ykM=.69c7554f-5e2b-40b9-8d1a-abe147548dbb@github.com> <0efX7bcHNl5p1RoF3VnqZIabdavsGosuMI14cZPDzbQ=.2bde6bbf-a59b-4f5b-9c68-7a8a258b2ee5@github.com> <7KdNVSXLx0N027uyQgtUuN82VpXTlyPpPOnBv3sqYRs=.6b549b56-36f9-4ab3-8469-4779d93dd1e7@github.com> Message-ID: On Thu, 6 Feb 2025 04:37:17 GMT, Chen Liang wrote: >> OK, if the extra load turns out to be a problem in the future, we could look into why the compilers are generating the load when the Class is known/constant. If the old intrinsic was able to pull the constant out of the Klass, then surely we can do the same and pull the value from the Class field. > > Does `static final` help here? Yes. Yes it does. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1944694824 From coleenp at openjdk.org Thu Feb 6 13:23:54 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Feb 2025 13:23:54 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v4] In-Reply-To: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: <4ruwzJXM3Jgy0rbobE3PPNAH4k8c10_4zAi6mCmc4Lw=.ccf7c825-4ffc-49fb-bc42-3c0168c1dcf8@github.com> > The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. > > There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. > > Tested with tier1-8. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add Klass::modifier_flags to look in the mirror, restore ciKlass::modifier_flags, add benchmark. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22652/files - new: https://git.openjdk.org/jdk/pull/22652/files/f92620eb..85026362 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22652&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22652&range=02-03 Stats: 28 lines in 7 files changed: 26 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22652.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22652/head:pull/22652 PR: https://git.openjdk.org/jdk/pull/22652 From eastig at amazon.co.uk Thu Feb 6 13:34:32 2025 From: eastig at amazon.co.uk (Astigeevich, Evgeny) Date: Thu, 6 Feb 2025 13:34:32 +0000 Subject: Deprecate -UseCompressedClassPointers? In-Reply-To: References: Message-ID: <9F5B6EE4-423D-4484-91F4-1BEF2AD8C198@amazon.co.uk> > Why would we still need `-UseCompressedClassPointers`? Two reasons: > ? > 2) To load more than ~5-6 million classes?. CodeCache size limit is 2G for all platforms. If those classes methods get JIT compiled, it is unlikely generated code will fit into CodeCache. -Evgeny Astigeevich From: hotspot-dev on behalf of Thomas St?fe Date: Tuesday 4 February 2025 at 11:52 To: hotspot-dev Cc: "Kennke, Roman" Subject: [EXTERNAL] Deprecate -UseCompressedClassPointers? CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi all, I would like to get rid of the `-UseCompressedClassPointers` case since it would cut down the number of configurations we need to support and test from three to two (`-UseCompressedClassPointers`, `+UseCompressedClassPointers`, `+UseCompactObjectHeaders`). This would leave us with the now default case, `+UseCompressedClassPointers`, as the sole supported CCP case, thereby removing the need for the switch, which we therefore should deprecate and eventually remove. Apart from significantly reducing code complexity and testing effort, `-UseCompressedClassPointers` does not seem to be tested that well, especially on 64-bit platforms. See e.g. https://github.com/openjdk/jdk/pull/23053, and Roman's suspicion is that there are many more. It increases memory usage by quite a bit ("Alias for -XX:WasteMemory" - Erik ?sterlund), and any historical connection to UseCompresseedOops have long been removed. Why would we still need `-UseCompressedClassPointers`? Two reasons: 1) To support 32-bit, where, atm, it is the only implemented mode. But I am confident that I can find some low-effort low-code way to "fake" compressed Klass* pointers, since after all the 32-bit address space could be seen as a 4GB class space. There is also the bigger question of the future of 32-bit - we discussed this at the FOSDEM OpenJDK workshop, with mixed results, but it seems likely that 32-bit will go away at some point, the only question is when. 2) To load more than ~5-6 million classes. Class space, when maxed out, allows for about 5-6 million classes, given a typical Klass size distribution. I think that number is ridiculous, though. If you load or generate that many classes, you are a likely very patient programmer with a leaky or misdesigned application (just consider for a moment that to fill 4GB class space to the brim with Klass instances, would would typically use up about 5-10 times as much in non-class metaspace. That is for metadata alone. I cannot see a sane application doing that. Is anyone using -UseCompressedClassPointers for any valid reason I am not aware of? If not, barring any objections, my plan is to deprecate UseCompressedClassPointers for JDK25, find an alternative for 32-bit platforms in JDK26, and remove the uncompressed case in JDK 26 or later. What do people think? Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Thu Feb 6 13:46:05 2025 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 6 Feb 2025 14:46:05 +0100 Subject: Deprecate -UseCompressedClassPointers? In-Reply-To: <9F5B6EE4-423D-4484-91F4-1BEF2AD8C198@amazon.co.uk> References: <9F5B6EE4-423D-4484-91F4-1BEF2AD8C198@amazon.co.uk> Message-ID: Hi Evgeny, you mean that since the code cache is limited to 2g, it is very unlikely to have that many classes since a significant part of those would not be JIT compiled? But the JVM would not fail, or? It would just continuously throw out old methods from the code cache? But our current assumption is that the only cases that are doing this are generator scenarios that create many classes, but that those never get hot enough. But its still a good point. Cheers, Thomas On Thu, Feb 6, 2025 at 2:34?PM Astigeevich, Evgeny wrote: > > Why would we still need `-UseCompressedClassPointers`? Two reasons: > > > ? > > > 2) To load more than ~5-6 million classes?. > > > > CodeCache size limit is 2G for all platforms. > > If those classes methods get JIT compiled, it is unlikely generated code > will fit into CodeCache. > > > > -Evgeny Astigeevich > > > > *From: *hotspot-dev on behalf of Thomas > St?fe > *Date: *Tuesday 4 February 2025 at 11:52 > *To: *hotspot-dev > *Cc: *"Kennke, Roman" > *Subject: *[EXTERNAL] Deprecate -UseCompressedClassPointers? > > > > *CAUTION*: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > > Hi all, > > I would like to get rid of the `-UseCompressedClassPointers` case since it > would cut down the number of configurations we need to support and test > from three to two (`-UseCompressedClassPointers`, > `+UseCompressedClassPointers`, `+UseCompactObjectHeaders`). > > This would leave us with the now default case, > `+UseCompressedClassPointers`, as the sole supported CCP case, thereby > removing the need for the switch, which we therefore should deprecate and > eventually remove. > > Apart from significantly reducing code complexity and testing effort, > `-UseCompressedClassPointers` does not seem to be tested that well, > especially on 64-bit platforms. See e.g. > https://github.com/openjdk/jdk/pull/23053, and Roman's suspicion is that > there are many more. > > It increases memory usage by quite a bit ("Alias for -XX:WasteMemory" - > Erik ?sterlund), and any historical connection to UseCompresseedOops have > long been removed. > > > Why would we still need `-UseCompressedClassPointers`? Two reasons: > > 1) To support 32-bit, where, atm, it is the only implemented mode. But I > am confident that I can find some low-effort low-code way to "fake" > compressed Klass* pointers, since after all the 32-bit address space could > be seen as a 4GB class space. There is also the bigger question of the > future of 32-bit - we discussed this at the FOSDEM OpenJDK workshop, with > mixed results, but it seems likely that 32-bit will go away at some point, > the only question is when. > > 2) To load more than ~5-6 million classes. Class space, when maxed out, > allows for about 5-6 million classes, given a typical Klass size > distribution. I think that number is ridiculous, though. If you load or > generate that many classes, you are a likely very patient programmer with a > leaky or misdesigned application (just consider for a moment that to fill > 4GB class space to the brim with Klass instances, would would typically use > up about 5-10 times as much in non-class metaspace. That is for metadata > alone. I cannot see a sane application doing that. > > Is anyone using -UseCompressedClassPointers for any valid reason I am not > aware of? > > If not, barring any objections, my plan is to deprecate > UseCompressedClassPointers for JDK25, find an alternative for 32-bit > platforms in JDK26, and remove the uncompressed case in JDK 26 or later. > > > > What do people think? > > > > Amazon Development Centre (London) Ltd.Registered in England and Wales > with registration number 04543232 with its registered office at 1 Principal > Place, Worship Street, London EC2A 2FA, United Kingdom. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsjolen at openjdk.org Thu Feb 6 13:47:09 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 6 Feb 2025 13:47:09 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 13:10:32 GMT, Thomas Stuefe wrote: >> For things I currently work on (compilation memory statistic), I need this functionality. >> >> Changes: >> >> - added leftmost() and rightmost() (pretty self-explanatory) >> - added print_on(outputStream*) (likewise) >> - const correctness >> - other minor cleanups >> - gtests for all added functions >> >> Tests: GHA (all clean), manual tests on Linux x64 > > src/hotspot/share/utilities/rbTree.inline.hpp line 484: > >> 482: template >> 483: inline void RBTree::visit_in_order(F f) const { >> 484: const RBNode* to_visit[64]; > > Note to self or others: This needs at least an assertion, better a visit-stop on release builds An assertion is nice-to-have. A "visit-stop", you mean we should break if we exceed the limit? I was thinking about this in the original PR. If we do actually have a RB-tree then we're bounded to 2log(n) depth, so this should be impossible to reach. That is, if we do have a RB-Tree, a bug would invalidate that assumption. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1944752215 From stuefe at openjdk.org Thu Feb 6 14:06:14 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Feb 2025 14:06:14 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 13:44:33 GMT, Johan Sj?len wrote: >> src/hotspot/share/utilities/rbTree.inline.hpp line 484: >> >>> 482: template >>> 483: inline void RBTree::visit_in_order(F f) const { >>> 484: const RBNode* to_visit[64]; >> >> Note to self or others: This needs at least an assertion, better a visit-stop on release builds > > An assertion is nice-to-have. A "visit-stop", you mean we should break if we exceed the limit? > > I was thinking about this in the original PR. If we do actually have a RB-tree then we're bounded to 2log(n) depth, so this should be impossible to reach. That is, if we do have a RB-Tree, a bug would invalidate that assumption. Our linters will definitely complain about that suspected buffer override and are probably not smart enough to figure out the tree limit. Definitely an assert. As long as we don't use the iteration for anything too intensive, either a guarantee or a bailout. If we start using iteration in performance-critical paths, we can rethink. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1944782893 From coleenp at openjdk.org Thu Feb 6 14:31:28 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Feb 2025 14:31:28 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v5] In-Reply-To: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: > The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. > > There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. > > Tested with tier1-8. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Make compute_modifiers return u2. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22652/files - new: https://git.openjdk.org/jdk/pull/22652/files/85026362..146e2551 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22652&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22652&range=03-04 Stats: 7 lines in 7 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/22652.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22652/head:pull/22652 PR: https://git.openjdk.org/jdk/pull/22652 From coleenp at openjdk.org Thu Feb 6 14:31:29 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Feb 2025 14:31:29 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: <1yPHOj_hANp7ZvMfmgi6lRkpokgNNaUSc09FJfZvWk8=.bfcf2780-4afe-4253-ae0b-e3bc6ab7ee86@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> <1yPHOj_hANp7ZvMfmgi6lRkpokgNNaUSc09FJfZvWk8=.bfcf2780-4afe-4253-ae0b-e3bc6ab7ee86@github.com> Message-ID: <9iIj0xWClD_H4U0MiEUrQGqeIgjyFdC4tuN0sAP9kUo=.1c11d464-4380-4954-9e9f-c40872acff24@github.com> On Wed, 5 Feb 2025 21:26:29 GMT, Dean Long wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix copyright and param name > > src/hotspot/share/oops/instanceKlass.hpp line 1128: > >> 1126: #endif >> 1127: >> 1128: int compute_modifier_flags() const; > > I don't see why this can't stay u2. I had some compilation error for conversion that has disappeared into the either with u2, so I've restored them to u2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1944825437 From alanb at openjdk.org Thu Feb 6 14:36:14 2025 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 6 Feb 2025 14:36:14 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v5] In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 17:57:29 GMT, Coleen Phillimore wrote: >> This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with three additional commits since the last revision: > > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Remove @Stable annotation for final field. test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java line 213: > 211: assertTrue(false); > 212: } catch (NoSuchFieldException expected) { } > 213: } The test is about accessibility, it's checking for IllegalAccessException and InaccessibleObjectException. So not the right place to test that a field is hidden from core reflection. Can you look at test/jdk/internal/reflect/Reflection/Filtering.java as that is probably the right place to list the protectionDomain field. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1944834044 From eastig at amazon.co.uk Thu Feb 6 15:14:38 2025 From: eastig at amazon.co.uk (Astigeevich, Evgeny) Date: Thu, 6 Feb 2025 15:14:38 +0000 Subject: Deprecate -UseCompressedClassPointers? In-Reply-To: References: <9F5B6EE4-423D-4484-91F4-1BEF2AD8C198@amazon.co.uk> Message-ID: > you mean that since the code cache is limited to 2g, it is very unlikely to have that many classes since a significant part of those would not be JIT compiled? > But the JVM would not fail, or? It would just continuously throw out old methods from the code cache? I mean if an application wants to keep in use more than 5-6 million classes, the application should forget about JIT compilation and performance. CodeCache will become a bottleneck. It is an interesting question, whether JVM would fail or not. IMO it would crash. I think some data structures would not accommodate such stress situation. It is also interesting how many classes JVM can currently keep alive and have JIT compilation working. > It would just continuously throw out old methods from the code cache? This depends on compilation requests rate. The process of flushing code cache depends on GC. If the rate is not high, this will happen. If the rate is high, the compilation will stop. In any case the performance of an application will be hurt a lot. Thanks, Evgeny From: Thomas St?fe Date: Thursday 6 February 2025 at 13:46 To: "Astigeevich, Evgeny" Cc: hotspot-dev , "Kennke, Roman" Subject: RE: [EXTERNAL] Deprecate -UseCompressedClassPointers? CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Evgeny, you mean that since the code cache is limited to 2g, it is very unlikely to have that many classes since a significant part of those would not be JIT compiled? But the JVM would not fail, or? It would just continuously throw out old methods from the code cache? But our current assumption is that the only cases that are doing this are generator scenarios that create many classes, but that those never get hot enough. But its still a good point. Cheers, Thomas On Thu, Feb 6, 2025 at 2:34?PM Astigeevich, Evgeny > wrote: > Why would we still need `-UseCompressedClassPointers`? Two reasons: > ? > 2) To load more than ~5-6 million classes?. CodeCache size limit is 2G for all platforms. If those classes methods get JIT compiled, it is unlikely generated code will fit into CodeCache. -Evgeny Astigeevich From: hotspot-dev > on behalf of Thomas St?fe > Date: Tuesday 4 February 2025 at 11:52 To: hotspot-dev > Cc: "Kennke, Roman" > Subject: [EXTERNAL] Deprecate -UseCompressedClassPointers? CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi all, I would like to get rid of the `-UseCompressedClassPointers` case since it would cut down the number of configurations we need to support and test from three to two (`-UseCompressedClassPointers`, `+UseCompressedClassPointers`, `+UseCompactObjectHeaders`). This would leave us with the now default case, `+UseCompressedClassPointers`, as the sole supported CCP case, thereby removing the need for the switch, which we therefore should deprecate and eventually remove. Apart from significantly reducing code complexity and testing effort, `-UseCompressedClassPointers` does not seem to be tested that well, especially on 64-bit platforms. See e.g. https://github.com/openjdk/jdk/pull/23053, and Roman's suspicion is that there are many more. It increases memory usage by quite a bit ("Alias for -XX:WasteMemory" - Erik ?sterlund), and any historical connection to UseCompresseedOops have long been removed. Why would we still need `-UseCompressedClassPointers`? Two reasons: 1) To support 32-bit, where, atm, it is the only implemented mode. But I am confident that I can find some low-effort low-code way to "fake" compressed Klass* pointers, since after all the 32-bit address space could be seen as a 4GB class space. There is also the bigger question of the future of 32-bit - we discussed this at the FOSDEM OpenJDK workshop, with mixed results, but it seems likely that 32-bit will go away at some point, the only question is when. 2) To load more than ~5-6 million classes. Class space, when maxed out, allows for about 5-6 million classes, given a typical Klass size distribution. I think that number is ridiculous, though. If you load or generate that many classes, you are a likely very patient programmer with a leaky or misdesigned application (just consider for a moment that to fill 4GB class space to the brim with Klass instances, would would typically use up about 5-10 times as much in non-class metaspace. That is for metadata alone. I cannot see a sane application doing that. Is anyone using -UseCompressedClassPointers for any valid reason I am not aware of? If not, barring any objections, my plan is to deprecate UseCompressedClassPointers for JDK25, find an alternative for 32-bit platforms in JDK26, and remove the uncompressed case in JDK 26 or later. What do people think? Amazon Development Centre (London) Ltd.Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cnorrbin at openjdk.org Thu Feb 6 15:27:32 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Thu, 6 Feb 2025 15:27:32 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v5] In-Reply-To: References: Message-ID: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: initialize node on insert + more tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23416/files - new: https://git.openjdk.org/jdk/pull/23416/files/ddf9de0e..52f6c63a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=03-04 Stats: 109 lines in 3 files changed: 79 ins; 15 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From cnorrbin at openjdk.org Thu Feb 6 15:30:10 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Thu, 6 Feb 2025 15:30:10 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v5] In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 15:27:32 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. >> >> Two key changes enable this feature: >> 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. >> 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. >> >> >> Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. >> >> An example of how you could use the intrusive tree is found below: >> >> ```c++ >> struct MyIntrusiveStructure { >> Node node; // The tree node is part of an external structure >> int data; >> >> MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} >> Node* get_node() { return &node; } >> static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } >> }; >> >> Tree my_intrusive_tree; >> >> Cursor insert_cursor = my_intrusive_tree.cursor_find(0); >> Node insert_node = Node(0); >> >> // Custom allocation here is just malloc >> MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); >> new (place) MyIntrusiveStructure(0, insert_node); >> >> my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); >> >> Cursor find_cursor = my_intrusive_tree.cursor_find(0); >> int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; >> >> >> >> Please let me know any feedback or concerns! > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > initialize node on insert + more tests I changed the way `insert_at_cursor` so it initializes the node so we can have completely uninitialized memory. To do this, the key used when creating the cursor is now stored in it. The example would now look something like this: ```c++ struct MyIntrusiveStructure { Node node; // The tree node is part of an external structure int data; MyIntrusiveStructure(int data) : data(data) {} Node* get_node() { return &node; } static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } }; Tree my_intrusive_tree; Cursor insert_cursor = my_intrusive_tree.cursor_find(0); // Custom allocation here is just malloc MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); new (place) MyIntrusiveStructure(123); my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); Cursor find_cursor = my_intrusive_tree.cursor_find(0); int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; ------------- PR Comment: https://git.openjdk.org/jdk/pull/23416#issuecomment-2640143273 From azafari at openjdk.org Thu Feb 6 15:51:41 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 6 Feb 2025 15:51:41 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v21] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. > - Adding a runtime flag for selecting the old or new version can be added later. > - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: fixed merge problems ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20425/files - new: https://git.openjdk.org/jdk/pull/20425/files/a0d133f9..873d5355 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=19-20 Stats: 284 lines in 7 files changed: 0 ins; 270 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From coleenp at openjdk.org Thu Feb 6 16:14:19 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Feb 2025 16:14:19 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v5] In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 14:33:24 GMT, Alan Bateman wrote: >> Coleen Phillimore has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java >> >> Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> >> - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java >> >> Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> >> - Remove @Stable annotation for final field. > > test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java line 213: > >> 211: assertTrue(false); >> 212: } catch (NoSuchFieldException expected) { } >> 213: } > > The test is about accessibility, it's checking for IllegalAccessException and InaccessibleObjectException. So not the right place to test that a field is hidden from core reflection. Can you look at test/jdk/internal/reflect/Reflection/Filtering.java as that is probably the right place to list the protectionDomain field. Thank you Alan for letting me know the right place for this test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1945012057 From jbechberger at openjdk.org Thu Feb 6 16:15:58 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Thu, 6 Feb 2025 16:15:58 GMT Subject: RFR: 8342818: Implement CPU Time Profiling for JFR [v37] In-Reply-To: References: Message-ID: > This is the code for the [JEP draft: CPU Time based profiling for JFR]. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Improve JFR buffer allocation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20752/files - new: https://git.openjdk.org/jdk/pull/20752/files/0c6ebbc2..0b698ff7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=36 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20752&range=35-36 Stats: 16 lines in 1 file changed: 10 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20752.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20752/head:pull/20752 PR: https://git.openjdk.org/jdk/pull/20752 From coleenp at openjdk.org Thu Feb 6 16:18:53 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Feb 2025 16:18:53 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v6] In-Reply-To: References: Message-ID: <0rJBsK2HU4bErJ4GKSt_XDEHgj2sfVPWBZ4rWzuV7uc=.6ae8ce99-a316-4f66-8ecd-c20db9937527@github.com> > This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. > Tested with tier1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Move test for protectionDomain filtering. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23396/files - new: https://git.openjdk.org/jdk/pull/23396/files/6bb7fe6e..3fd90cd1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23396&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23396&range=04-05 Stats: 13 lines in 2 files changed: 1 ins; 10 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23396.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23396/head:pull/23396 PR: https://git.openjdk.org/jdk/pull/23396 From liach at openjdk.org Thu Feb 6 16:20:21 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 6 Feb 2025 16:20:21 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v5] In-Reply-To: <4aAX8rSEcvkeYteaJUXHfVEzBbNGwGlhDLIz548dFcs=.616fa7dd-d5bf-42d5-aca0-0bea0b5591d0@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> <8Wx3xbbOnPXS5n1RuNaesqHbhKV3iLwrCVF0s6uWOrA=.cb20728e-e13c-4667-822b-3ba424cbc12f@github.com> <4aAX8rSEcvkeYteaJUXHfVEzBbNGwGlhDLIz548dFcs=.616fa7dd-d5bf-42d5-aca0-0bea0b5591d0@github.com> Message-ID: On Thu, 6 Feb 2025 12:08:59 GMT, Coleen Phillimore wrote: >> If Class had other fields smaller than `int`, would be consider making this something like `char` to save space (allowing all the sub-word fields to be compacted)? > > I thought of doing this since I made modifiers u2 in the Hotspot code just previously, but all the Java code refers to this as an int. And I didn't see other fields to compact it with. Maybe if access_flags are moved we could make them both char (not short since they're unsigned). It feels weird to not have unsigned short to my C++ eyes. >From a Java perspective, using `char` for the field is completely fine; this field is only accessed via `getModifiers` and not set by Java code, so the automatic widening conversion can handle it all. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1945021458 From mdoerr at openjdk.org Thu Feb 6 17:13:14 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 6 Feb 2025 17:13:14 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v19] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Wed, 5 Feb 2025 08:38:58 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > adapt Condition registers src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 658: > 656: __ bind(loop); > 657: __ vspltisb(vZero, 0); > 658: __ li(temp1, 0); I don't think these instructions should be inside of the loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1945125696 From jsjolen at openjdk.org Thu Feb 6 17:55:13 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 6 Feb 2025 17:55:13 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 08:06:04 GMT, Thomas Stuefe wrote: > For things I currently work on (compilation memory statistic), I need this functionality. > > Changes: > > - added leftmost() and rightmost() (pretty self-explanatory) > - added print_on(outputStream*) (likewise) > - const correctness > - other minor cleanups > - gtests for all added functions > > Tests: GHA (all clean), manual tests on Linux x64 Personally, I think we're fine with Tree and Node without the Type suffix, that'll be obvious from the usage sites anyway. Functionality wise, seems fine. Will look at tests in more detail later. src/hotspot/share/utilities/rbTree.hpp line 117: > 115: }; // End: RBNode > 116: > 117: typedef RBTree::RBNode NodeType; Can't do `typedef TreeType::RBNode`? src/hotspot/share/utilities/rbTree.hpp line 281: > 279: > 280: // Returns leftmost node, nullptr if tree is empty. > 281: // If COMPARATOR::cmp(a, b) behaves canonically ("1" for a < b), this will the smallest key value. That should be `-1` and not `1`, no? Copy-paste error with rightmost :-). FWIW, you can use `-1` and it'll show up in my IDE rendered as-if markdown. Do with that info what you will. src/hotspot/share/utilities/rbTree.hpp line 304: > 302: RBNode* leftmost() { return const_cast(static_cast(this)->leftmost()); } > 303: > 304: // Returns rightmost node (smallest key). Returns nullptr if tree is empty. Inconsistent commenting re: result if tree is empty. Also, say "null" and not "nullptr" here. src/hotspot/share/utilities/rbTree.inline.hpp line 561: > 559: void print_T(outputStream* st, T x) { > 560: st->print(PTR_FORMAT, p2i(x)); > 561: } I've done something like this before but ended up not integrating it. Seems like this is something we should have in a generic place, in the future. Just a note, nothing to fix. src/hotspot/share/utilities/rbTree.inline.hpp line 568: > 566: st->sp(1 + depth * 2); > 567: st->print("@" PTR_FORMAT ": [", p2i(n)); > 568: print_T(st, n->key()); Do you need to provide the template parameter because key and val returns references? I would've assumed that C++ can infer the type and pick the right function. src/hotspot/share/utilities/rbTree.inline.hpp line 572: > 570: print_T(st, n->val()); > 571: st->cr(); > 572: depth ++; Style: No space between depth and ++ test/hotspot/gtest/utilities/test_rbtree.cpp line 549: > 547: > 548: struct PtrCmp { > 549: static int cmp(const void* a, const void* b) { return a == b ? 0 : (a > b ? 1 : -1); } I'm fairly sure that it's UB to compare pointers with `>`, can you cast these to `uintptr_t` so we don't have to take the ubsan fix review later? test/hotspot/gtest/utilities/test_rbtree.cpp line 600: > 598: stringStream ss; > 599: tree.print_on(&ss); > 600: // tty->print_cr("%s", ss.base()); Delete ------------- PR Review: https://git.openjdk.org/jdk/pull/23486#pullrequestreview-2598727376 PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1944755734 PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1944763150 PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1944765568 PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1944770869 PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1944772601 PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1944773204 PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1945175774 PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1945180019 From thomas.stuefe at gmail.com Thu Feb 6 18:12:31 2025 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 6 Feb 2025 19:12:31 +0100 Subject: Deprecate -UseCompressedClassPointers? In-Reply-To: References: <9F5B6EE4-423D-4484-91F4-1BEF2AD8C198@amazon.co.uk> Message-ID: Thanks, Evgeny, these are all good points. On Thu, Feb 6, 2025 at 4:14?PM Astigeevich, Evgeny wrote: > > you mean that since the code cache is limited to 2g, it is very > unlikely to have that many classes since a significant part of those would > not be JIT compiled? > > > But the JVM would not fail, or? It would just continuously throw out old > methods from the code cache? > > > > I mean if an application wants to keep in use more than 5-6 million > classes, the application should forget about JIT compilation and > performance. CodeCache will become a bottleneck. > > It is an interesting question, whether JVM would fail or not. IMO it would > crash. I think some data structures would not accommodate such stress > situation. > > It is also interesting how many classes JVM can currently keep alive and > have JIT compilation working. > > > It would just continuously throw out old methods from the code cache? > > > > This depends on compilation requests rate. The process of flushing code > cache depends on GC. > > If the rate is not high, this will happen. If the rate is high, the > compilation will stop. > > > > In any case the performance of an application will be hurt a lot. > > > > > > Thanks, > > Evgeny > > > > *From: *Thomas St?fe > *Date: *Thursday 6 February 2025 at 13:46 > *To: *"Astigeevich, Evgeny" > *Cc: *hotspot-dev , "Kennke, Roman" < > rkennke at amazon.de> > *Subject: *RE: [EXTERNAL] Deprecate -UseCompressedClassPointers? > > > > *CAUTION*: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > Hi Evgeny, > > > > you mean that since the code cache is limited to 2g, it is very unlikely > to have that many classes since a significant part of those would not be > JIT compiled? But the JVM would not fail, or? It would just continuously > throw out old methods from the code cache? But our current assumption is > that the only cases that are doing this are generator scenarios that create > many classes, but that those never get hot enough. > > > > But its still a good point. > > > > Cheers, Thomas > > > > On Thu, Feb 6, 2025 at 2:34?PM Astigeevich, Evgeny > wrote: > > > Why would we still need `-UseCompressedClassPointers`? Two reasons: > > > ? > > > 2) To load more than ~5-6 million classes?. > > > > CodeCache size limit is 2G for all platforms. > > If those classes methods get JIT compiled, it is unlikely generated code > will fit into CodeCache. > > > > -Evgeny Astigeevich > > > > *From: *hotspot-dev on behalf of Thomas > St?fe > *Date: *Tuesday 4 February 2025 at 11:52 > *To: *hotspot-dev > *Cc: *"Kennke, Roman" > *Subject: *[EXTERNAL] Deprecate -UseCompressedClassPointers? > > > > *CAUTION*: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > > Hi all, > > I would like to get rid of the `-UseCompressedClassPointers` case since it > would cut down the number of configurations we need to support and test > from three to two (`-UseCompressedClassPointers`, > `+UseCompressedClassPointers`, `+UseCompactObjectHeaders`). > > This would leave us with the now default case, > `+UseCompressedClassPointers`, as the sole supported CCP case, thereby > removing the need for the switch, which we therefore should deprecate and > eventually remove. > > Apart from significantly reducing code complexity and testing effort, > `-UseCompressedClassPointers` does not seem to be tested that well, > especially on 64-bit platforms. See e.g. > https://github.com/openjdk/jdk/pull/23053, and Roman's suspicion is that > there are many more. > > It increases memory usage by quite a bit ("Alias for -XX:WasteMemory" - > Erik ?sterlund), and any historical connection to UseCompresseedOops have > long been removed. > > > Why would we still need `-UseCompressedClassPointers`? Two reasons: > > 1) To support 32-bit, where, atm, it is the only implemented mode. But I > am confident that I can find some low-effort low-code way to "fake" > compressed Klass* pointers, since after all the 32-bit address space could > be seen as a 4GB class space. There is also the bigger question of the > future of 32-bit - we discussed this at the FOSDEM OpenJDK workshop, with > mixed results, but it seems likely that 32-bit will go away at some point, > the only question is when. > > 2) To load more than ~5-6 million classes. Class space, when maxed out, > allows for about 5-6 million classes, given a typical Klass size > distribution. I think that number is ridiculous, though. If you load or > generate that many classes, you are a likely very patient programmer with a > leaky or misdesigned application (just consider for a moment that to fill > 4GB class space to the brim with Klass instances, would would typically use > up about 5-10 times as much in non-class metaspace. That is for metadata > alone. I cannot see a sane application doing that. > > Is anyone using -UseCompressedClassPointers for any valid reason I am not > aware of? > > If not, barring any objections, my plan is to deprecate > UseCompressedClassPointers for JDK25, find an alternative for 32-bit > platforms in JDK26, and remove the uncompressed case in JDK 26 or later. > > > > What do people think? > > > > > Amazon Development Centre (London) Ltd.Registered in England and Wales > with registration number 04543232 with its registered office at 1 Principal > Place, Worship Street, London EC2A 2FA, United Kingdom. > > > > > Amazon Development Centre (London) Ltd.Registered in England and Wales > with registration number 04543232 with its registered office at 1 Principal > Place, Worship Street, London EC2A 2FA, United Kingdom. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsjolen at openjdk.org Thu Feb 6 18:36:21 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 6 Feb 2025 18:36:21 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v2] In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 18:49:07 GMT, Kim Barrett wrote: >> Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: >> >> update based on feedback > > test/hotspot/jtreg/sources/TestNoNULL.java line 135: > >> 133: private static boolean checkForNull(Path path) throws IOException { >> 134: boolean found = false; >> 135: List lines = Files.readAllLines(path, StandardCharsets.UTF_8); > > I would have thought it would be better to read and check a line at a time. Though it probably doesn't really > matter all that much. We've got human-sized source files, I think this is fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1945237725 From duke at openjdk.org Thu Feb 6 18:47:54 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 6 Feb 2025 18:47:54 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: Message-ID: > By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Adding comments + some code reorganization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23300/files - new: https://git.openjdk.org/jdk/pull/23300/files/9f7c4a23..9a3a9444 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23300&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23300&range=03-04 Stats: 447 lines in 3 files changed: 140 ins; 247 del; 60 mod Patch: https://git.openjdk.org/jdk/pull/23300.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23300/head:pull/23300 PR: https://git.openjdk.org/jdk/pull/23300 From azafari at openjdk.org Thu Feb 6 18:58:24 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 6 Feb 2025 18:58:24 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v20] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Tue, 4 Feb 2025 13:56:50 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fix in shendoahCardTable > > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2013, 2025, Oracle and/or its affiliates. All rights reserved. > > Weird, another copyright issue here Fixed. > src/hotspot/share/nmt/virtualMemoryTracker.hpp line 3: > >> 1: /* >> 2: * Copyright (c) 2013, 2024, Oracle and/or its affiliates. All rights reserved. >> 3: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. > > Incorrect copyright addition. Fixed. > src/hotspot/share/nmt/virtualMemoryTracker.hpp line 27: > >> 25: >> 26: #ifndef NMT_VIRTUALMEMORYTRACKER_HPP >> 27: #define NMT_VIRTUALMEMORYTRACKER_HPP > > This shouldn't be changed??? Done. > src/hotspot/share/nmt/vmatree.cpp line 81: > >> 79: stA.out.set_tag(tag); >> 80: LEQ_A.state.out.set_tag(tag); >> 81: stB.in.set_tag(tag); > > Commented out assert and an addition I'm trying to wrap my head around. Does this fix a pre-existing bug? If so, this should be a separate PR for mainline before this PR is integrated. Removed the commented out line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1945265638 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1945265403 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1945265243 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1945265068 From gziemski at openjdk.org Thu Feb 6 20:18:35 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 6 Feb 2025 20:18:35 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v13] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: move NMT recording APIs down into NMT code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/2f591c51..fb0c2636 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=11-12 Stats: 48 lines in 6 files changed: 10 ins; 23 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From gziemski at openjdk.org Thu Feb 6 20:22:35 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 6 Feb 2025 20:22:35 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v14] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: remove runtime flag, revert formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/fb0c2636..c18bf55a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=12-13 Stats: 6 lines in 2 files changed: 2 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From gziemski at openjdk.org Thu Feb 6 20:33:49 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 6 Feb 2025 20:33:49 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v15] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: fix path breakage ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/c18bf55a..12a0fd2f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=13-14 Stats: 348 lines in 3 files changed: 81 ins; 91 del; 176 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From gziemski at openjdk.org Thu Feb 6 20:41:09 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 6 Feb 2025 20:41:09 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v16] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: fix path breakage ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/12a0fd2f..18af1c74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=14-15 Stats: 5 lines in 1 file changed: 3 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From kbarrett at openjdk.org Thu Feb 6 20:56:15 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 6 Feb 2025 20:56:15 GMT Subject: RFR: 8323158: HotSpot Style Guide should specify more include ordering [v2] In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 12:14:35 GMT, Stefan Karlsson wrote: >> The HotSpot Style Guide has a section about source files and includes. The style used for includes have mostly been introduced by scripts when includeDB was replaced, but also when various other enhancements to our includes were made. Some of the introduced styles were never written down in the style guide. >> >> I propose a couple of changes to the HotSpot Style Guide to reflect some of these implicit styles that we have. While updating the text I also took the liberty to order the items in an order that I felt was good. >> >> Note that JDK-8323158 contains a few more suggestions, but I've only addressed the items that I think can be accepted without much contention. Either I extract the items that have not been address into a new RFE, or I create a new RFE for this PR. >> >> There a some small whitespace tweaks that I made so that the .md and .html had a similar layout. > > Stefan Karlsson has updated the pull request incrementally with two additional commits since the last revision: > > - Update hotspot-style.md > - Update hotspot-style.html Marked as reviewed by kbarrett (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23388#pullrequestreview-2599902569 From vlivanov at openjdk.org Thu Feb 6 21:15:12 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 6 Feb 2025 21:15:12 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v2] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> <0ZM_vg_dAmbdbeoIeZ8ylBUDj_4_jxM-aE6IKoH6ykM=.69c7554f-5e2b-40b9-8d1a-abe147548dbb@github.com> <0efX7bcHNl5p1RoF3VnqZIabdavsGosuMI14cZPDzbQ=.2bde6bbf-a59b-4f5b-9c68-7a8a258b2ee5@github.com> <7KdNVSXLx0N027uyQgtUuN82VpXTlyPpPOnBv3sqYRs=.6b549b56-36f9-4ab3-8469-4779d93dd1e7@github.com> Message-ID: On Thu, 6 Feb 2025 13:08:31 GMT, Coleen Phillimore wrote: >> Does `static final` help here? > > Yes. Yes it does. Cases when a class mirror is a compile-time constant are already well-optimized. Non constant cases are the ones where missing optimization opportunities arise. In this particular case, C2 doesn't benefit from the observation that `Clazz[]` is a leaf type at runtime (no subclasses). Hence, a value loaded from a field typed as `Clazz[]` has exactly the same type and `clazzArray.getClass()` can be constant folded to `Clazz[].class`. Rather than a common case, it feels more like a corner case. So, worth addressing as a follow-up enhancement. Another scenario is a meet of 2 primitive array types (ends up as `bottom[]` in C2 type system), but I believe it hasn't been optimized before. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22652#discussion_r1945451909 From vlivanov at openjdk.org Thu Feb 6 21:21:18 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 6 Feb 2025 21:21:18 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v5] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Thu, 6 Feb 2025 14:31:28 GMT, Coleen Phillimore wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Make compute_modifiers return u2. Looks good. (Except a left-over `???` in a comment.) I very much like this cleanup. Migrating from Klass to Class simplifies compiler logic since there's no need to care about primitives at runtime anymore. Speaking of missing optimization opportunities (demonstrated by one microbenchmark), it looks like a corner case and can be addressed later. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22652#pullrequestreview-2599983789 From vlivanov at openjdk.org Thu Feb 6 21:25:23 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 6 Feb 2025 21:25:23 GMT Subject: RFR: 8337251: C1: Improve Class.isInstance intrinsic [v4] In-Reply-To: References: Message-ID: On Mon, 27 Jan 2025 16:13:35 GMT, Andrew Haley wrote: >> This replaces a runtime call to `Runtime1::is_instance_of()` by a platform-dependent C1 intrinsic. >> >> This improves overall performance significantly. and it minimizes icache footprint. >> >> The original commit contains this comment: >> >> >> // TODO could try to substitute this node with an equivalent InstanceOf >> // if clazz is known to be a constant Class. This will pick up newly found >> // constants after HIR construction. I'll leave this to a future change. >> >> >> >> However, there's little performance to be gained by restricting this optimization to constant Class instances, and after this this patch, C1 `Class.isInstance()` compares favorably with the current platform-dependent `instanceof` intrinsic. >> >> It's not strictly necessary for other platforms to implement this optimization. >> >> Performance: >> >> Xeon-E5 2430, before and after:: >> >> >> Benchmark Score Error Score Error Units >> SecondarySupersLookup.testNegative00 11.783 ? 0.491 10.459 ? 0.183 ns/op >> SecondarySupersLookup.testNegative01 11.757 ? 0.127 10.475 ? 0.661 ns/op >> SecondarySupersLookup.testNegative02 11.771 ? 0.700 10.479 ? 0.357 ns/op >> SecondarySupersLookup.testNegative55 23.997 ? 1.816 16.854 ? 1.034 ns/op >> SecondarySupersLookup.testNegative60 29.598 ? 1.326 26.828 ? 0.637 ns/op >> SecondarySupersLookup.testNegative63 74.528 ? 3.157 69.431 ? 0.357 ns/op >> SecondarySupersLookup.testNegative64 75.936 ? 1.805 70.124 ? 0.397 ns/op >> >> SecondarySupersLookup.testPositive01 15.257 ? 1.179 9.722 ? 0.326 ns/op >> SecondarySupersLookup.testPositive02 15.164 ? 1.383 9.737 ? 0.708 ns/op >> SecondarySupersLookup.testPositive03 15.166 ? 0.934 9.726 ? 0.184 ns/op >> SecondarySupersLookup.testPositive40 20.384 ? 0.530 12.805 ? 0.778 ns/op >> SecondarySupersLookup.testPositive50 15.118 ? 0.140 9.735 ? 0.555 ns/op >> SecondarySupersLookup.testPositive60 20.415 ? 3.083 11.603 ? 0.106 ns/op >> SecondarySupersLookup.testPositive63 65.478 ? 8.484 58.507 ? 2.837 ns/op >> SecondarySupersLookup.testPositive64 75.880 ? 1.047 68.667 ? 1.347 ns/op >> >> >> AArch64 (Apple M1) >> >> >> Benchmark Score Error Score Error Units >> SecondarySupersLookup.testNegative00 4.139 ? 0.005 2.815 ? 0.014 ns/op >> SecondarySupersLookup.testNegative01 4.071 ? 0.153 ... > > Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Next > - Next > - Merge branch 'master' into JDK-8337251 > - More > - Next > - Windows fix, maybe. > - Update > - Update > - Test fix/ > - Test fix/ > - ... and 18 more: https://git.openjdk.org/jdk/compare/764d70b7...13a2d93e Looks good. I'll submit it for testing. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22491#pullrequestreview-2600007970 From ayang at openjdk.org Thu Feb 6 21:45:13 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 6 Feb 2025 21:45:13 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Thu, 6 Feb 2025 06:35:46 GMT, David Holmes wrote: > can you explain how this protocol is intended to work please. When a GC is requested, the `block()` function sets `_is_gc_request_pending` to `true` and then waits until all threads have exited their critical regions. Any thread attempting to enter a critical region during this time will detect the pending GC flag in `enter()` and follow the slow path, effectively waiting until the GC completes. The storeload barrier is critical to ensure that these two variables -- `_is_gc_request_pending` and the thread-local `_jni_active_critical` -- are accessed in the proper order. > If you think you need an atomic load here, then it would be needed for in_critical() so just add it there. `in_critical()` is used only by the owning thread, which has exclusive write access. Therefore, its access does not need to be atomic. However, the reads performed by other threads must be atomic, I believe. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2641116616 From kbarrett at openjdk.org Thu Feb 6 21:45:13 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 6 Feb 2025 21:45:13 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v2] In-Reply-To: <5SUTxzDb_jOFp4iB1-utmXIu-osA0-r5LaYwixoL_qk=.ee94c3d6-7ba2-4012-ab1b-3d6a0113d1ed@github.com> References: <5SUTxzDb_jOFp4iB1-utmXIu-osA0-r5LaYwixoL_qk=.ee94c3d6-7ba2-4012-ab1b-3d6a0113d1ed@github.com> Message-ID: On Wed, 5 Feb 2025 20:16:57 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java` and `.jar` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: > > update based on feedback Changes requested by kbarrett (Reviewer). test/hotspot/jtreg/sources/TestNoNULL.java line 46: > 44: private static final Set excludedTestFiles = new HashSet<>(); > 45: private static final Set excludedTestExtensions = Set.of(".c", ".java", ".jar"); > 46: private static final Pattern NULL_PATTERN = Pattern.compile("(? References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: remove realloc_malloc stuff ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/18af1c74..6302a300 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=15-16 Stats: 10 lines in 1 file changed: 0 ins; 9 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From coleenp at openjdk.org Thu Feb 6 23:26:31 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Feb 2025 23:26:31 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v6] In-Reply-To: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: > The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. > > There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. > > Tested with tier1-8. Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: - Remove ??? in the code. - Hide Class.modifiers field. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22652/files - new: https://git.openjdk.org/jdk/pull/22652/files/146e2551..304a17ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22652&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22652&range=04-05 Stats: 6 lines in 3 files changed: 1 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/22652.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22652/head:pull/22652 PR: https://git.openjdk.org/jdk/pull/22652 From coleenp at openjdk.org Thu Feb 6 23:26:31 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Feb 2025 23:26:31 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v5] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Thu, 6 Feb 2025 14:31:28 GMT, Coleen Phillimore wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Make compute_modifiers return u2. Thank you Vladimir for encouraging me to continue this change. I removed the ??? and hid the modifiers field for reflection as suggested in this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22652#issuecomment-2641339406 From kdnilsen at openjdk.org Thu Feb 6 23:34:15 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 6 Feb 2025 23:34:15 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v3] In-Reply-To: <6_AoWQhldJttOIEOL1T7HSapPzE4Qn2j4WN7E-bI3rM=.2685d3d8-e47c-42a6-845b-b68f50cc568e@github.com> References: <6_AoWQhldJttOIEOL1T7HSapPzE4Qn2j4WN7E-bI3rM=.2685d3d8-e47c-42a6-845b-b68f50cc568e@github.com> Message-ID: On Thu, 23 Jan 2025 05:45:43 GMT, Cesar Soares Lucas wrote: >> In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. >> >> The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. >> >> The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. >> >> Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. >> >> The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. >> >> Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. >> >> Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge master > - Addressing PR comments: some refactorings, ppc fix, off-by-one fix. > - Relocation of Card Tables Thanks for pulling this together. Looks great. ------------- Marked as reviewed by kdnilsen (Author). PR Review: https://git.openjdk.org/jdk/pull/23170#pullrequestreview-2600251495 From dholmes at openjdk.org Fri Feb 7 06:46:10 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 7 Feb 2025 06:46:10 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Thu, 6 Feb 2025 21:42:50 GMT, Albert Mingkun Yang wrote: > in_critical() is used only by the owning thread, I see code using `thr->in_critical()` which is not obviously being executed by the current thread on itself. But in any case adding the atomic load to `in_critical()` is basically a no-op (loads are atomic) so no need to add a new API just to do that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2642070840 From stuefe at openjdk.org Fri Feb 7 07:00:21 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 7 Feb 2025 07:00:21 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 17:50:28 GMT, Johan Sj?len wrote: > Personally, I think we're fine with Tree and Node without the Type suffix, that'll be obvious from the usage sites anyway. Functionality wise, seems fine. Will look at tests in more detail later. What do you refer to? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23486#issuecomment-2642088277 From dholmes at openjdk.org Fri Feb 7 07:01:39 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 7 Feb 2025 07:01:39 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Thu, 6 Feb 2025 21:42:50 GMT, Albert Mingkun Yang wrote: > The storeload barrier is critical ... I'm not sure it is sufficient. I would have expected some full fences to be needed here as this is very similar to the interaction of thread state with safepoints. I will look closer on Monday (sorry). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2642089369 From stuefe at openjdk.org Fri Feb 7 07:04:10 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 7 Feb 2025 07:04:10 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 13:56:23 GMT, Johan Sj?len wrote: >> For things I currently work on (compilation memory statistic), I need this functionality. >> >> Changes: >> >> - added leftmost() and rightmost() (pretty self-explanatory) >> - added print_on(outputStream*) (likewise) >> - const correctness >> - other minor cleanups >> - gtests for all added functions >> >> Tests: GHA (all clean), manual tests on Linux x64 > > src/hotspot/share/utilities/rbTree.inline.hpp line 561: > >> 559: void print_T(outputStream* st, T x) { >> 560: st->print(PTR_FORMAT, p2i(x)); >> 561: } > > I've done something like this before but ended up not integrating it. Seems like this is something we should have in a generic place, in the future. Just a note, nothing to fix. I agree > src/hotspot/share/utilities/rbTree.inline.hpp line 568: > >> 566: st->sp(1 + depth * 2); >> 567: st->print("@" PTR_FORMAT ": [", p2i(n)); >> 568: print_T(st, n->key()); > > Do you need to provide the template parameter because key and val returns references? I would've assumed that C++ can infer the type and pick the right function. More for expressiveness ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1946044334 PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1946044635 From stuefe at openjdk.org Fri Feb 7 07:28:55 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 7 Feb 2025 07:28:55 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees [v2] In-Reply-To: References: Message-ID: > For things I currently work on (compilation memory statistic), I need this functionality. > > Changes: > > - added leftmost() and rightmost() (pretty self-explanatory) > - added print_on(outputStream*) (likewise) > - const correctness > - other minor cleanups > - gtests for all added functions > > Tests: GHA (all clean), manual tests on Linux x64 Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: feedback johan ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23486/files - new: https://git.openjdk.org/jdk/pull/23486/files/56fbca44..c96cfa35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23486&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23486&range=00-01 Stats: 14 lines in 3 files changed: 5 ins; 4 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23486.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23486/head:pull/23486 PR: https://git.openjdk.org/jdk/pull/23486 From stuefe at openjdk.org Fri Feb 7 07:28:55 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 7 Feb 2025 07:28:55 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees [v2] In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 17:50:28 GMT, Johan Sj?len wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> feedback johan > > Personally, I think we're fine with Tree and Node without the Type suffix, that'll be obvious from the usage sites anyway. Functionality wise, seems fine. Will look at tests in more detail later. Hi @jdksjolen, thank you for the review. I think I addressed all points. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23486#issuecomment-2642125957 From shade at openjdk.org Fri Feb 7 08:19:48 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 7 Feb 2025 08:19:48 GMT Subject: RFR: 8349639: jdk/jdk/jfr/event/gc/detailed/TestShenandoahEvacuationInformationEvent.java fails to compile after JDK-8348610 Message-ID: A simple test bug crept in through https://github.com/openjdk/jdk/commit/bad39b6d8892ba9b86bc81bf01108a1df617defb. Additional testing: - [x] Affected test now passes ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/23511/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23511&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349639 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23511.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23511/head:pull/23511 PR: https://git.openjdk.org/jdk/pull/23511 From jsjolen at openjdk.org Fri Feb 7 09:12:11 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Feb 2025 09:12:11 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees [v2] In-Reply-To: References: Message-ID: <3DZjBjl2ib-1ZtbJmECaXPj9-a0SF3dmTtziK7Vq3vw=.6897d423-3641-4bb6-9535-9a7768f50153@github.com> On Fri, 7 Feb 2025 07:28:55 GMT, Thomas Stuefe wrote: >> For things I currently work on (compilation memory statistic), I need this functionality. >> >> Changes: >> >> - added leftmost() and rightmost() (pretty self-explanatory) >> - added print_on(outputStream*) (likewise) >> - const correctness >> - other minor cleanups >> - gtests for all added functions >> >> Tests: GHA (all clean), manual tests on Linux x64 > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > feedback johan src/hotspot/share/utilities/rbTree.hpp line 48: > 46: class RBTree { > 47: friend class RBTreeTest; > 48: typedef RBTree TreeType; I'm referring to this being `TreeType` and not only `Tree`, same with `Node`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1946202773 From jsjolen at openjdk.org Fri Feb 7 09:17:12 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Feb 2025 09:17:12 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 07:28:55 GMT, Thomas Stuefe wrote: >> For things I currently work on (compilation memory statistic), I need this functionality. >> >> Changes: >> >> - added leftmost() and rightmost() (pretty self-explanatory) >> - added print_on(outputStream*) (likewise) >> - const correctness >> - other minor cleanups >> - gtests for all added functions >> >> Tests: GHA (all clean), manual tests on Linux x64 > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > feedback johan Thanks, this looks good. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23486#pullrequestreview-2601206392 From stefank at openjdk.org Fri Feb 7 10:01:36 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 7 Feb 2025 10:01:36 GMT Subject: RFR: 8349652: Rewire nmethod oop load barriers Message-ID: When loading oops from nmethods we current use the Access API to inject load barriers for the GCs that requires them. As part of the ZGC load barrier we need access to the nmethod to properly perform the load barrier. The current implementation of the Access API doesn't support passing down the nmethod through all its layers of code so ZGC asks the code cache what nmethod the various oops belongs to. There's currently an open PR for JDK-8343789 (#21276), which moves the oops out of the code cache, so the current way ZGC implementation will not work after that has been integrated. The proposal is to figure out a way to explicitly pass down the nmethod to the load barriers. We could extend the Access API to pass down the nmethod through all its various layers. The drawback of that is that it adds a lot of boiler plate code and requires new over loads and/or names. Given that this isn't performance critical code I propose that we take the much simpler route and call straight to the BarrierSetNMethod class. Given that MMethodAccess and IN_NMETHOD were only introduced to support nmethod oop loads for ZGC and are note used anymore I've also removed them from the code. Tested with reproducer for the ZGC issue in JDK-8343789, tier1-7 Linux with ZGC tasks, currently running tier1-3. ------------- Commit messages: - NMethod barrier Changes: https://git.openjdk.org/jdk/pull/23512/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23512&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349652 Stats: 76 lines in 10 files changed: 39 ins; 14 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/23512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23512/head:pull/23512 PR: https://git.openjdk.org/jdk/pull/23512 From jsjolen at openjdk.org Fri Feb 7 10:40:22 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Feb 2025 10:40:22 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v21] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: <1AtAN_70cbiU2-KRyPK90QnwMedZxIsZ22KgBwioyOc=.e415931f-34cd-4157-9cc6-08b95d89efa2@github.com> On Thu, 6 Feb 2025 15:51:41 GMT, Afshin Zafari wrote: >> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. >> - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. >> - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` >> - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. >> - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. >> - Adding a runtime flag for selecting the old or new version can be added later. >> - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > fixed merge problems More things to be removed. src/hotspot/share/nmt/vmatree.hpp line 215: > 213: tty->print_cr("Flag %s R: " INT64_FORMAT " C: " INT64_FORMAT, NMTUtil::tag_to_enum_name((MemTag)i), tag[i].reserve, tag[i].commit); > 214: } > 215: } This can be removed src/hotspot/share/nmt/vmatree.hpp line 267: > 265: }); > 266: tty->cr(); > 267: } This can be removed, I'm rather sure(?) src/hotspot/share/opto/stringopts.cpp line 173: > 171: } > 172: void add_control(Node* ctrl) { > 173: assert(!_control.contains(ctrl), "only push once"); Remove the changes in this file. src/hotspot/share/opto/stringopts.hpp line 1: > 1: /* Remove the changes in this file. test/hotspot/gtest/nmt/test_nmt_memoryfiletracker.cpp line 46: > 44: EXPECT_EQ(file->_summary.by_tag(mtTest)->committed(), sz(100)); > 45: tracker.free_memory(file, 50, 10); > 46: EXPECT_EQ(file->_summary.by_tag(mtTest)->committed(), sz(90)); This change should be done in mainline, not in this PR. test/hotspot/gtest/nmt/test_nmt_treap.cpp line 238: > 236: EXPECT_LE(unexpected_count, REPEATS / 2) << "SSL Avg: " << sll_sum / REPEATS << " Treap Avg: " << treap_sum / REPEATS; > 237: } > 238: These can be removed. We shouldn't have performance benchmarks running on tier1, as they'll use unnecessary CPU and time. We're also removing the treap in favour of the RB-tree soon :-). ------------- PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2601371143 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1946317189 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1946316910 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1946316278 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1946315914 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1946309823 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1946313782 From cnorrbin at openjdk.org Fri Feb 7 11:19:13 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Fri, 7 Feb 2025 11:19:13 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees [v2] In-Reply-To: References: Message-ID: <7x7c0D-rmJZPC1DZf_3SGHfbQazkYXvBuMjcgefRfQY=.4fa2c7a2-a4f6-4cde-9997-fae2b175f2db@github.com> On Fri, 7 Feb 2025 07:28:55 GMT, Thomas Stuefe wrote: >> For things I currently work on (compilation memory statistic), I need this functionality. >> >> Changes: >> >> - added leftmost() and rightmost() (pretty self-explanatory) >> - added print_on(outputStream*) (likewise) >> - const correctness >> - other minor cleanups >> - gtests for all added functions >> >> Tests: GHA (all clean), manual tests on Linux x64 > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > feedback johan Hi, think this looks good overall! Just have a couple small comments. src/hotspot/share/utilities/rbTree.hpp line 282: > 280: // Returns leftmost node, nullptr if tree is empty. > 281: // If COMPARATOR::cmp(a, b) behaves canonically (positive value for a > b), this will the smallest key value. > 282: const RBNode* leftmost() const { Just a thought, no change needed: The intrusive tree PR has the member `RBNode* _first` to get the leftmost node in constant time instead of having to traverse down the tree, but at the cost of an extra check when inserting/removing. Which solution do you prefer? I don't really have a preference so can adapt that PR either which way. test/hotspot/gtest/utilities/test_rbtree.cpp line 402: > 400: for (int j = 0; j < 10; j++) { > 401: if (j == 0) { > 402: ASSERT_EQ(rbtree_const.leftmost(), (const Node*)nullptr); Style: All the previous tests use `EXPECT`s instead of `ASSERT`s. Goes for the other new tests as well. test/hotspot/gtest/utilities/test_rbtree.cpp line 417: > 415: max = MAX2(max, r); > 416: } > 417: // rbtree_const.print_on(tty); Delete test/hotspot/gtest/utilities/test_rbtree.cpp line 424: > 422: n = rbtree.leftmost(); > 423: ASSERT_EQ(n->key(), min); > 424: n->set_val(1); Why are the node's values set to 1? test/hotspot/gtest/utilities/test_rbtree.cpp line 557: > 555: > 556: TEST_VM(RBTreeTestNonFixture, TestPrintPointerTree) { > 557: typedef RBTree > TreeType; Can use `RBTreeCHeap` here instead. ------------- PR Review: https://git.openjdk.org/jdk/pull/23486#pullrequestreview-2601442477 PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1946353181 PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1946357902 PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1946358076 PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1946370029 PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1946364698 From galder at openjdk.org Fri Feb 7 12:31:11 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 7 Feb 2025 12:31:11 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> Message-ID: On Fri, 17 Jan 2025 17:53:24 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo @eastig is helping with the results on aarch64, so I will verify the numbers in same way done below for x86_64 once he provides me with the results. Here is a summary of the benchmarking results I'm seeing on x86_64 (I will push an update that just merges the latest master shortly). First I will go through the results of `MinMaxVector`. This benchmark computes throughput by default so the higher the number the better. # MinMaxVector AVX-512 Following are results with AVX-512 instructions: Benchmark (probability) (range) (seed) (size) Mode Cnt Baseline Patch Units MinMaxVector.longClippingRange N/A 90 0 1000 thrpt 4 834.127 3688.961 ops/ms MinMaxVector.longClippingRange N/A 100 0 1000 thrpt 4 1147.010 3687.721 ops/ms MinMaxVector.longLoopMax 50 N/A N/A 2048 thrpt 4 1126.718 1072.812 ops/ms MinMaxVector.longLoopMax 80 N/A N/A 2048 thrpt 4 1070.921 1070.538 ops/ms MinMaxVector.longLoopMax 100 N/A N/A 2048 thrpt 4 510.483 1073.081 ops/ms MinMaxVector.longLoopMin 50 N/A N/A 2048 thrpt 4 935.658 1016.910 ops/ms MinMaxVector.longLoopMin 80 N/A N/A 2048 thrpt 4 1007.410 933.774 ops/ms MinMaxVector.longLoopMin 100 N/A N/A 2048 thrpt 4 536.582 1017.337 ops/ms MinMaxVector.longReductionMax 50 N/A N/A 2048 thrpt 4 967.288 966.945 ops/ms MinMaxVector.longReductionMax 80 N/A N/A 2048 thrpt 4 967.327 967.382 ops/ms MinMaxVector.longReductionMax 100 N/A N/A 2048 thrpt 4 849.689 967.327 ops/ms MinMaxVector.longReductionMin 50 N/A N/A 2048 thrpt 4 966.323 967.275 ops/ms MinMaxVector.longReductionMin 80 N/A N/A 2048 thrpt 4 967.340 967.228 ops/ms MinMaxVector.longReductionMin 100 N/A N/A 2048 thrpt 4 880.921 967.233 ops/ms ### `longReduction[Min|Max]` performance improves slightly when probability is 100 Without the patch the code uses compare instructions: 7.83% ???? ???? ? 0x00007f4f700fb305: imulq $0xb, 0x20(%r14, %r8, 8), %rdi ???? ???? ? ;*lmul {reexecute=0 rethrow=0 return_oop=0} ???? ???? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMax at 24 (line 255) ???? ???? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub at 19 (line 124) 5.64% ???? ???? ? 0x00007f4f700fb30b: cmpq %rdi, %rdx ????????? ? 0x00007f4f700fb30e: jge 0x7f4f700fb32c ;*lreturn {reexecute=0 rethrow=0 return_oop=0} ????????? ? ; - java.lang.Math::max at 11 (line 2037) ????????? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMax at 30 (line 256) ????????? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub at 19 (line 124) 12.82% ?????????? ? 0x00007f4f700fb310: imulq $0xb, 0x28(%r14, %r8, 8), %rbp ?????????? ? ;*lmul {reexecute=0 rethrow=0 return_oop=0} ?????????? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMax at 24 (line 255) ?????????? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub at 19 (line 124) 7.46% ?????????? ? 0x00007f4f700fb316: cmpq %rbp, %rdi ?????????? ? 0x00007f4f700fb319: jl 0x7f4f700fb2e0 ;*iflt {reexecute=0 rethrow=0 return_oop=0} ????? ???? ? ; - java.lang.Math::max at 3 (line 2037) ????? ???? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMax at 30 (line 256) ????? ???? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub at 19 (line 124) And with the patch these become vectorized: ? ?? ????? 0x00007f56280fad10: vpmullq 0xf0(%rdx, %rsi, 8), %ymm10, %ymm4 8.35% ? ?? ????? 0x00007f56280fad1b: vpmullq 0xd0(%rdx, %rsi, 8), %ymm10, %ymm5 4.27% ? ?? ????? 0x00007f56280fad26: vpmullq 0x10(%rdx, %rsi, 8), %ymm10, %ymm6 ? ?? ????? ; {no_reloc} 4.22% ? ?? ????? 0x00007f56280fad31: vpmullq 0x30(%rdx, %rsi, 8), %ymm10, %ymm7 4.00% ? ?? ????? 0x00007f56280fad3c: vpmullq 0xb0(%rdx, %rsi, 8), %ymm10, %ymm8 4.13% ? ?? ????? 0x00007f56280fad47: vpmullq 0x50(%rdx, %rsi, 8), %ymm10, %ymm11 4.10% ? ?? ????? 0x00007f56280fad52: vpmullq 0x70(%rdx, %rsi, 8), %ymm10, %ymm12 4.13% ? ?? ????? 0x00007f56280fad5d: vpmullq 0x90(%rdx, %rsi, 8), %ymm10, %ymm13 4.03% ? ?? ????? 0x00007f56280fad68: vpmaxsq %ymm6, %ymm3, %ymm3 ? ?? ????? 0x00007f56280fad6e: vpmaxsq %ymm7, %ymm3, %ymm3 4.72% ? ?? ????? 0x00007f56280fad74: vpmaxsq %ymm11, %ymm3, %ymm3 ? ?? ????? 0x00007f56280fad7a: vpmaxsq %ymm12, %ymm3, %ymm3 8.40% ? ?? ????? 0x00007f56280fad80: vpmaxsq %ymm13, %ymm3, %ymm3 23.11% ? ?? ????? 0x00007f56280fad86: vpmaxsq %ymm8, %ymm3, %ymm3 2.15% ? ?? ????? 0x00007f56280fad8c: vpmaxsq %ymm5, %ymm3, %ymm3 8.79% ? ?? ????? 0x00007f56280fad92: vpmaxsq %ymm4, %ymm3, %ymm3 ;*invokestatic max {reexecute=0 rethrow=0 return_oop=0} ? ?? ????? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMax at 30 (line 256) ? ?? ????? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub at 19 (line 124) ### `longLoop[Min|Max]` performance improves considerably when probability is 100 Without the patch the code uses compare + move instructions: 4.53% ???? ?? ? ? 0x00007f96b40faf33: movq 0x18(%rax, %rsi, 8), %r13;*laload {reexecute=0 rethrow=0 return_oop=0} ???? ?? ? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longLoopMax at 20 (line 236) ???? ?? ? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub at 19 (line 124) 2.69% ???? ?? ? ? 0x00007f96b40faf38: cmpq %r11, %r13 ????? ?? ? ? 0x00007f96b40faf3b: jl 0x7f96b40faf67 ;*lreturn {reexecute=0 rethrow=0 return_oop=0} ????? ?? ? ? ; - java.lang.Math::max at 11 (line 2037) ????? ?? ? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longLoopMax at 27 (line 236) ????? ?? ? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub at 19 (line 124) 8.75% ????? ??? ? ? 0x00007f96b40faf3d: movq %r13, 0x18(%rbp, %rsi, 8);*lastore {reexecute=0 rethrow=0 return_oop=0} ????? ??? ? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longLoopMax at 30 (line 236) ????? ??? ? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub at 19 (line 124) And with the patch those become vectorized: 3.55% ? ?? 0x00007f13c80fa18a: vmovdqu 0xf0(%rbx, %r10, 8), %ymm5 ? ?? 0x00007f13c80fa194: vmovdqu 0xf0(%rdi, %r10, 8), %ymm6 2.35% ? ?? 0x00007f13c80fa19e: vpmaxsq %ymm6, %ymm5, %ymm5 5.03% ? ?? 0x00007f13c80fa1a4: vmovdqu %ymm5, 0xf0(%rax, %r10, 8) ? ?? ;*lastore {reexecute=0 rethrow=0 return_oop=0} ? ?? ; - org.openjdk.bench.java.lang.MinMaxVector::longLoopMax at 30 (line 236) ? ?? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub at 19 (line 124) It's interesting to observe that at probabilites of 50/80% the baseline performs better than at 100%. The reason for that is because at 50/80% the baseline already vectorizes. So, why isn't the baseline vectorizing at 100% probability? VLoop::check_preconditions Loop: N1256/N463 limit_check counted [int,int),+4 (3161 iters) main rc has_sfpt strip_mined 1256 CountedLoop === 1256 598 463 [[ 1256 1257 1271 1272 ]] inner stride: 4 main of N1256 strip mined !orig=[1126],[599],[590],[307] !jvms: MinMaxVector::longLoopMax @ bci:10 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) VLoop::check_preconditions: fails because of control flow. cl_exit 594 594 CountedLoopEnd === 415 593 [[ 1275 463 ]] [lt] P=0.999684, C=707717.000000 !orig=[462] !jvms: MinMaxVector::longLoopMax @ bci:7 (line 235) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) cl_exit->in(0) 415 415 Region === 415 411 412 [[ 415 594 416 451 ]] !orig=[423] !jvms: Math::max @ bci:11 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) lpt->_head 1256 1256 CountedLoop === 1256 598 463 [[ 1256 1257 1271 1272 ]] inner stride: 4 main of N1256 strip mined !orig=[1126],[599],[590],[307] !jvms: MinMaxVector::longLoopMax @ bci:10 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) Loop: N1256/N463 limit_check counted [int,int),+4 (3161 iters) main rc has_sfpt strip_mined VLoop::check_preconditions: failed: control flow in loop not allowed At 100% probability baseline fails to vectorize because it observes a control flow. This control flow is not the one you see in min/max implementations, but this is one added by HotSpot as a result of the JIT profiling. It observes that one branch is always taken so it optimizes for that, and adds a branch for the uncommon case where the branch is not taken. ### `longClippingRange` performance improves considerably Without the patch the code uses compare + move instructions: 3.39% ?? ? ?? ? 0x00007febb40fb175: cmpq %rbp, %rcx ?? ?? ?? ? 0x00007febb40fb178: jge 0x7febb40fb17d ;*iflt {reexecute=0 rethrow=0 return_oop=0} ?? ?? ?? ? ; - java.lang.Math::max at 3 (line 2037) ?? ?? ?? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longClippingRange at 25 (line 220) ?? ?? ?? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longClippingRange_jmhTest::longClippingRange_thrpt_jmhStub at 19 (line 124) 2.69% ?? ?? ?? ? 0x00007febb40fb17a: movq %rbp, %rcx ;*lreturn {reexecute=0 rethrow=0 return_oop=0} ?? ?? ?? ? ; - java.lang.Math::max at 11 (line 2037) ?? ?? ?? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longClippingRange at 25 (line 220) ?? ?? ?? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longClippingRange_jmhTest::longClippingRange_thrpt_jmhStub at 19 (line 124) 4.35% ?? ?? ?? ? 0x00007febb40fb17d: nop 2.93% ?? ? ?? ? 0x00007febb40fb180: cmpq %r8, %rcx ?? ? ? ?? ? 0x00007febb40fb183: jle 0x7febb40fb188 ;*ifgt {reexecute=0 rethrow=0 return_oop=0} ?? ? ? ?? ? ; - java.lang.Math::min at 3 (line 2132) ?? ? ? ?? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longClippingRange at 32 (line 220) ?? ? ? ?? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longClippingRange_jmhTest::longClippingRange_thrpt_jmhStub at 19 (line 124) 3.51% ?? ? ? ?? ? 0x00007febb40fb185: movq %r8, %rcx ;*lreturn {reexecute=0 rethrow=0 return_oop=0} ?? ? ? ?? ? ; - java.lang.Math::min at 11 (line 2132) ?? ? ? ?? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longClippingRange at 32 (line 220) ?? ? ? ?? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longClippingRange_jmhTest::longClippingRange_thrpt_jmhStub at 19 (line 124) 4.26% ?? ? ? ?? ? 0x00007febb40fb188: movq %rcx, 0x10(%rsi, %r9, 8);*lastore {reexecute=0 rethrow=0 return_oop=0} ?? ? ?? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longClippingRange at 35 (line 220) ?? ? ?? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longClippingRange_jmhTest::longClippingRange_thrpt_jmhStub at 19 (line 124) With the patch these become vectorized: 0.20% ??? ? 0x00007f10180fd15c: vmovdqu 0x10(%r11, %rcx, 8), %ymm6 ??? ? 0x00007f10180fd163: vpmaxsq %ymm6, %ymm7, %ymm6 ??? ? 0x00007f10180fd169: vpminsq %ymm8, %ymm6, %ymm6 ??? ? 0x00007f10180fd16f: vmovdqu %ymm6, 0x10(%r8, %rcx, 8);*lastore {reexecute=0 rethrow=0 return_oop=0} ??? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longClippingRange at 35 (line 220) ??? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longClippingRange_jmhTest::longClippingRange_thrpt_jmhStub at 19 (line 124) # `MinMaxVector` AVX2 Following are results on the same machine as above but forcing AVX2 to be used instead of AVX-512: Benchmark (probability) (range) (seed) (size) Mode Cnt Baseline Patch Units MinMaxVector.longClippingRange N/A 90 0 1000 thrpt 4 832.132 1813.609 ops/ms MinMaxVector.longClippingRange N/A 100 0 1000 thrpt 4 832.546 1814.477 ops/ms MinMaxVector.longLoopMax 50 N/A N/A 2048 thrpt 4 938.372 939.313 ops/ms MinMaxVector.longLoopMax 80 N/A N/A 2048 thrpt 4 934.964 945.124 ops/ms MinMaxVector.longLoopMax 100 N/A N/A 2048 thrpt 4 512.076 937.287 ops/ms MinMaxVector.longLoopMin 50 N/A N/A 2048 thrpt 4 999.455 689.750 ops/ms MinMaxVector.longLoopMin 80 N/A N/A 2048 thrpt 4 1000.352 876.326 ops/ms MinMaxVector.longLoopMin 100 N/A N/A 2048 thrpt 4 536.359 999.475 ops/ms MinMaxVector.longReductionMax 50 N/A N/A 2048 thrpt 4 409.413 409.363 ops/ms MinMaxVector.longReductionMax 80 N/A N/A 2048 thrpt 4 409.374 409.141 ops/ms MinMaxVector.longReductionMax 100 N/A N/A 2048 thrpt 4 883.614 409.318 ops/ms MinMaxVector.longReductionMin 50 N/A N/A 2048 thrpt 4 404.723 404.705 ops/ms MinMaxVector.longReductionMin 80 N/A N/A 2048 thrpt 4 404.755 404.748 ops/ms MinMaxVector.longReductionMin 100 N/A N/A 2048 thrpt 4 848.784 404.669 ops/ms ### `longClippingRange` performance improves considerably Baseline uses compare + move instructions as shown above. But the patched version improves in spite of not being able to use AVX-512 instructions such as `vpmaxsq`. The performance improvements come from using other vectorized compare + vectorized move instructions: ? ? ???? 0x00007f9aa40f94ac: vpcmpgtq %ymm6, %ymm7, %ymm12 3.79% ? ? ???? 0x00007f9aa40f94b1: vblendvpd %ymm12, %ymm7, %ymm6, %ymm12 3.72% ? ? ???? 0x00007f9aa40f94b7: vpcmpgtq %ymm8, %ymm12, %ymm10 ? ? ???? 0x00007f9aa40f94bc: vblendvpd %ymm10, %ymm8, %ymm12, %ymm10 3.78% ? ? ???? 0x00007f9aa40f94c2: vmovdqu %ymm10, 0xf0(%r8, %rcx, 8) ? ? ???? ;*lastore {reexecute=0 rethrow=0 return_oop=0} ? ? ???? ; - org.openjdk.bench.java.lang.MinMaxVector::longClippingRange at 35 (line 220) ? ? ???? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longClippingRange_jmhTest::longClippingRange_thrpt_jmhStub at 19 (line 124) ### `longReduction[Min|Max]` performance drops considerably when probability is 100 Baseline uses compare + move instruction to implement this: ???? ???? ? ;*lmul {reexecute=0 rethrow=0 return_oop=0} ???? ???? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMax at 24 (line 255) ???? ???? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub at 19 (line 124) 6.30% ???? ???? ? 0x00007fd5580f678b: cmpq %rdi, %rdx ????????? ? 0x00007fd5580f678e: jge 0x7fd5580f67ac ;*lreturn {reexecute=0 rethrow=0 return_oop=0} ????????? ? ; - java.lang.Math::max at 11 (line 2037) ????????? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMax at 30 (line 256) ????????? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub at 19 (line 124) 12.88% ?????????? ? 0x00007fd5580f6790: imulq $0xb, 0x28(%r14, %r8, 8), %rbp ?????????? ? ;*lmul {reexecute=0 rethrow=0 return_oop=0} ?????????? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMax at 24 (line 255) ?????????? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub at 19 (line 124) 7.55% ?????????? ? 0x00007fd5580f6796: cmpq %rbp, %rdi ?????????? ? 0x00007fd5580f6799: jl 0x7fd5580f6760 ;*iflt {reexecute=0 rethrow=0 return_oop=0} ????? ???? ? ; - java.lang.Math::max at 3 (line 2037) ????? ???? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMax at 30 (line 256) ????? ???? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub at 19 (line 124) With the patch the code uses conditional moves instead: 0.05% ?? 0x00007fc4700f5253: imulq $0xb, 0x28(%r14, %r11, 8), %rdx 10.62% ?? 0x00007fc4700f5259: imulq $0xb, 0x20(%r14, %r11, 8), %rax 0.63% ?? 0x00007fc4700f525f: imulq $0xb, 0x10(%r14, %r11, 8), %r8 ?? ;*lmul {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMax at 24 (line 255) ?? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub at 19 (line 124) 10.34% ?? 0x00007fc4700f5265: cmpq %r8, %r13 2.37% ?? 0x00007fc4700f5268: cmovlq %r8, %r13 ;*invokestatic max {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMax at 30 (line 256) ?? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub at 19 (line 124) 1.15% ?? 0x00007fc4700f526c: imulq $0xb, 0x18(%r14, %r11, 8), %r8 ?? ;*lmul {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMax at 24 (line 255) ?? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub at 19 (line 124) 9.28% ?? 0x00007fc4700f5272: cmpq %r8, %r13 3.82% ?? 0x00007fc4700f5275: cmovlq %r8, %r13 21.61% ?? 0x00007fc4700f5279: cmpq %rax, %r13 11.55% ?? 0x00007fc4700f527c: cmovlq %rax, %r13 4.48% ?? 0x00007fc4700f5280: cmpq %rdx, %r13 11.76% ?? 0x00007fc4700f5283: cmovlq %rdx, %r13 ;*invokestatic max {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMax at 30 (line 256) ?? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub at 19 (line 124) When one of the branches is taken always or almost always, the branched code of baseline can be optimized with branch prediction. However, the conditional move instructions force the CPU to compute both sides of the branch, so it performs worse in this scenario. Why vectorized instructions are not used in this scenario? Vector instructions for min/max are not available with AVX2 and the trace vectorization signals it: PackSet::print: 3 packs Pack: 0 0: 1119 LoadL === 1105 343 1120 [[ 1117 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #long (does not depend only on test, unknown control) !orig=997,663,[457] !jvms: MinMaxVector::longReductionMax @ bci:23 (line 255) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) 1: 1112 LoadL === 1105 343 1113 [[ 1111 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #long (does not depend only on test, unknown control) !orig=663,[457] !jvms: MinMaxVector::longReductionMax @ bci:23 (line 255) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) 2: 997 LoadL === 1105 343 998 [[ 996 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #long (does not depend only on test, unknown control) !orig=663,[457] !jvms: MinMaxVector::longReductionMax @ bci:23 (line 255) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) 3: 663 LoadL === 1105 343 455 [[ 458 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #long (does not depend only on test, unknown control) !orig=[457] !jvms: MinMaxVector::longReductionMax @ bci:23 (line 255) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) Pack: 1 0: 1117 MulL === _ 1119 162 [[ 1116 ]] !orig=996,458 !jvms: MinMaxVector::longReductionMax @ bci:24 (line 255) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) 1: 1111 MulL === _ 1112 162 [[ 1110 ]] !orig=458 !jvms: MinMaxVector::longReductionMax @ bci:24 (line 255) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) 2: 996 MulL === _ 997 162 [[ 995 ]] !orig=458 !jvms: MinMaxVector::longReductionMax @ bci:24 (line 255) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) 3: 458 MulL === _ 663 162 [[ 459 ]] !jvms: MinMaxVector::longReductionMax @ bci:24 (line 255) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) Pack: 2 0: 1116 MaxL === _ 1128 1117 [[ 1110 ]] !orig=995,459,1012 !jvms: MinMaxVector::longReductionMax @ bci:30 (line 256) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) 1: 1110 MaxL === _ 1116 1111 [[ 995 ]] !orig=459,1012 !jvms: MinMaxVector::longReductionMax @ bci:30 (line 256) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) 2: 995 MaxL === _ 1110 996 [[ 459 ]] !orig=459,1012 !jvms: MinMaxVector::longReductionMax @ bci:30 (line 256) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) 3: 459 MaxL === _ 995 458 [[ 1128 923 570 ]] !orig=1012 !jvms: MinMaxVector::longReductionMax @ bci:30 (line 256) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) WARNING: Removed pack: not implemented at any smaller size: 0: 1116 MaxL === _ 1128 1117 [[ 1110 ]] !orig=995,459,1012 !jvms: MinMaxVector::longReductionMax @ bci:30 (line 256) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) 1: 1110 MaxL === _ 1116 1111 [[ 995 ]] !orig=459,1012 !jvms: MinMaxVector::longReductionMax @ bci:30 (line 256) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) 2: 995 MaxL === _ 1110 996 [[ 459 ]] !orig=459,1012 !jvms: MinMaxVector::longReductionMax @ bci:30 (line 256) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) 3: 459 MaxL === _ 995 458 [[ 1128 923 570 ]] !orig=1012 !jvms: MinMaxVector::longReductionMax @ bci:30 (line 256) MinMaxVector_longReductionMax_jmhTest::longReductionMax_thrpt_jmhStub @ bci:19 (line 124) After SuperWord::split_packs_only_implemented_with_smaller_size One interesting question option to explore here would be if MaxL/MinL could be implemented in terms of vectorized compare instructions, as shown above in the `longClippingRange` scenario. Thoughts @rwestrel @eme64? # `VectorReduction2.WithSuperword` on AVX-512 machine As requested by Emanuel I've also run this benchmark. Note that the results here are time per op, so the lower the number the better: Benchmark (SIZE) (seed) Mode Cnt Baseline Patch Units VectorReduction2.WithSuperword.longMaxBig 2048 0 avgt 3 3970.527 1918.821 ns/op VectorReduction2.WithSuperword.longMaxDotProduct 2048 0 avgt 3 1369.634 1055.762 ns/op VectorReduction2.WithSuperword.longMaxSimple 2048 0 avgt 3 722.314 2172.064 ns/op VectorReduction2.WithSuperword.longMinBig 2048 0 avgt 3 3996.694 1918.398 ns/op VectorReduction2.WithSuperword.longMinDotProduct 2048 0 avgt 3 1363.687 1056.375 ns/op VectorReduction2.WithSuperword.longMinSimple 2048 0 avgt 3 718.150 2179.478 ns/op `long[Min|Max]Big` and `long[Min|Max]DotProduct` benchmarks show considerable improvements, but something odd is happening in `long[Min|Max]Simple`. ### `long[Min|Max]Simple` performance drops considerably Baseline uses compare + moves instructions: 8.05% ?? ??? ? 0x00007f9d580f569b: movq 0x18(%r13, %r11, 8), %r8;*laload {reexecute=0 rethrow=0 return_oop=0} ?? ??? ? ; - org.openjdk.bench.vm.compiler.VectorReduction2::longMaxSimple at 22 (line 1054) ?? ??? ? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub at 17 (line 190) 0.23% ?? ??? ? 0x00007f9d580f56a0: cmpq %r8, %rsi ??? ??? ? 0x00007f9d580f56a3: jl 0x7f9d580f5713 ;*lreturn {reexecute=0 rethrow=0 return_oop=0} ??? ??? ? ; - java.lang.Math::max at 11 (line 2037) ??? ??? ? ; - org.openjdk.bench.vm.compiler.VectorReduction2::longMaxSimple at 28 (line 1055) ??? ??? ? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub at 17 (line 190) Patched version uses conditional moves instead of vectorized instructions: 2.76% ?? 0x00007fcd180f695c: movq 0x18(%r14, %r11, 8), %rdi;*laload {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.vm.compiler.VectorReduction2::longMaxSimple at 22 (line 1054) ?? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub at 17 (line 190) ?? 0x00007fcd180f6961: cmpq %rdi, %r13 3.11% ?? 0x00007fcd180f6964: cmovlq %rdi, %r13 ;*invokestatic max {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.vm.compiler.VectorReduction2::longMaxSimple at 28 (line 1055) ?? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub at 17 (line 190) Why are vectorized instructions not kicking in with patch? Because superword doesn't think it's profitable to vectorize this: PackSet::print: 2 packs Pack: 0 0: 733 LoadL === 721 184 734 [[ 732 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #long (does not depend only on test, unknown control) !orig=669,500,[319] !jvms: VectorReduction2::longMaxSimple @ bci:22 (line 1054) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) 1: 728 LoadL === 721 184 729 [[ 727 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #long (does not depend only on test, unknown control) !orig=500,[319] !jvms: VectorReduction2::longMaxSimple @ bci:22 (line 1054) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) 2: 669 LoadL === 721 184 670 [[ 668 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #long (does not depend only on test, unknown control) !orig=500,[319] !jvms: VectorReduction2::longMaxSimple @ bci:22 (line 1054) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) 3: 500 LoadL === 721 184 317 [[ 320 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #long (does not depend only on test, unknown control) !orig=[319] !jvms: VectorReduction2::longMaxSimple @ bci:22 (line 1054) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) Pack: 1 0: 732 MaxL === _ 743 733 [[ 727 ]] !orig=668,320,685 !jvms: VectorReduction2::longMaxSimple @ bci:28 (line 1055) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) 1: 727 MaxL === _ 732 728 [[ 668 ]] !orig=320,685 !jvms: VectorReduction2::longMaxSimple @ bci:28 (line 1055) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) 2: 668 MaxL === _ 727 669 [[ 320 ]] !orig=320,685 !jvms: VectorReduction2::longMaxSimple @ bci:28 (line 1055) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) 3: 320 MaxL === _ 668 500 [[ 743 593 456 ]] !orig=685 !jvms: VectorReduction2::longMaxSimple @ bci:28 (line 1055) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) WARNING: Removed pack: not profitable: 0: 732 MaxL === _ 743 733 [[ 727 ]] !orig=668,320,685 !jvms: VectorReduction2::longMaxSimple @ bci:28 (line 1055) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) 1: 727 MaxL === _ 732 728 [[ 668 ]] !orig=320,685 !jvms: VectorReduction2::longMaxSimple @ bci:28 (line 1055) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) 2: 668 MaxL === _ 727 669 [[ 320 ]] !orig=320,685 !jvms: VectorReduction2::longMaxSimple @ bci:28 (line 1055) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) 3: 320 MaxL === _ 668 500 [[ 743 593 456 ]] !orig=685 !jvms: VectorReduction2::longMaxSimple @ bci:28 (line 1055) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) WARNING: Removed pack: not profitable: 0: 733 LoadL === 721 184 734 [[ 732 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #long (does not depend only on test, unknown control) !orig=669,500,[319] !jvms: VectorReduction2::longMaxSimple @ bci:22 (line 1054) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) 1: 728 LoadL === 721 184 729 [[ 727 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #long (does not depend only on test, unknown control) !orig=500,[319] !jvms: VectorReduction2::longMaxSimple @ bci:22 (line 1054) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) 2: 669 LoadL === 721 184 670 [[ 668 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #long (does not depend only on test, unknown control) !orig=500,[319] !jvms: VectorReduction2::longMaxSimple @ bci:22 (line 1054) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) 3: 500 LoadL === 721 184 317 [[ 320 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=8; #long (does not depend only on test, unknown control) !orig=[319] !jvms: VectorReduction2::longMaxSimple @ bci:22 (line 1054) VectorReduction2_WithSuperword_longMaxSimple_jmhTest::longMaxSimple_avgt_jmhStub @ bci:17 (line 190) After Superword::filter_packs_for_profitable PackSet::print: 0 packs SuperWord::transform_loop failed: SuperWord::SLP_extract did not vectorize How can you make it vectorize? By doing something with the value in the array before passing it to min/max. That is what `MinMaxVector.longReduction[Min|Max]` and `VectorReduction2.long[Min|Max]DotProduct` methods do. # `VectorReduction2.NoSuperword` on AVX-512 machine Benchmark (SIZE) (seed) Mode Cnt Baseline Patch Units VectorReduction2.NoSuperword.longMaxBig 2048 0 avgt 3 3964.403 2966.258 ns/op VectorReduction2.NoSuperword.longMaxDotProduct 2048 0 avgt 3 1686.373 2462.876 ns/op VectorReduction2.NoSuperword.longMaxSimple 2048 0 avgt 3 722.219 2171.859 ns/op VectorReduction2.NoSuperword.longMinBig 2048 0 avgt 3 3994.685 2971.143 ns/op VectorReduction2.NoSuperword.longMinDotProduct 2048 0 avgt 3 1366.291 2428.173 ns/op VectorReduction2.NoSuperword.longMinSimple 2048 0 avgt 3 719.218 2179.546 ns/op Performance improves or `long[Min|Max]Big`. `long[Min|Max]Simple` suffers similar issues as shown in previous section because when not vectorized, these benchmarks fallback on conditional moves. The drop in performance in `long[Min|Max]DotProduct` needs some explanation. ### `long[Min|Max]DotProduct` performance drops considerably Baseline uses compare + move instructions here: 5.67% ??? ???? ? 0x00007f3fcc0fa71d: movq 0x20(%r14, %r8, 8), %r9 5.19% ??? ???? ? 0x00007f3fcc0fa722: imulq 0x20(%rax, %r8, 8), %r9;*lmul {reexecute=0 rethrow=0 return_oop=0} ??? ???? ? ; - org.openjdk.bench.vm.compiler.VectorReduction2::longMaxDotProduct at 30 (line 1125) ??? ???? ? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorReduction2_NoSuperword_longMaxDotProduct_jmhTest::longMaxDotProduct_avgt_jmhStub at 17 (line 190) 8.46% ??? ???? ? 0x00007f3fcc0fa728: cmpq %r9, %rsi ???????? ? 0x00007f3fcc0fa72b: jl 0x7f3fcc0fa751 ;*lreturn {reexecute=0 rethrow=0 return_oop=0} ???????? ? ; - java.lang.Math::max at 11 (line 2037) ???????? ? ; - org.openjdk.bench.vm.compiler.VectorReduction2::longMaxDotProduct at 36 (line 1126) ???????? ? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorReduction2_NoSuperword_longMaxDotProduct_jmhTest::longMaxDotProduct_avgt_jmhStub at 17 (line 190) Patch transforms this into conditional moves: 11.00% ? 0x00007f66f40f70b2: movq 0x18(%r13, %rcx, 8), %rax ? 0x00007f66f40f70b7: imulq 0x18(%r9, %rcx, 8), %rax;*lmul {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.VectorReduction2::longMaxDotProduct at 30 (line 1125) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorReduction2_NoSuperword_longMaxDotProduct_jmhTest::longMaxDotProduct_avgt_jmhStub at 17 (line 190) ? 0x00007f66f40f70bd: cmpq %rdx, %rax 13.07% ? 0x00007f66f40f70c0: cmovlq %rdx, %rax ;*invokestatic max {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.VectorReduction2::longMaxDotProduct at 36 (line 1126) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorReduction2_NoSuperword_longMaxDotProduct_jmhTest::longMaxDotProduct_avgt_jmhStub at 17 (line 190) This is similar to what we have seen above. Lacking superword functionality, the fallback for MaxL/MinL implies using conditional moves. Although branch probabilities are not controlled here, we can observe that one of the branches is likely being taken ~100% of the time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2642788364 From coleenp at openjdk.org Fri Feb 7 12:34:40 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 7 Feb 2025 12:34:40 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v7] In-Reply-To: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: > The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. > > There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. > > Tested with tier1-8. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix jvmci test. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22652/files - new: https://git.openjdk.org/jdk/pull/22652/files/304a17ee..37a8cf81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22652&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22652&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22652.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22652/head:pull/22652 PR: https://git.openjdk.org/jdk/pull/22652 From galder at openjdk.org Fri Feb 7 12:39:24 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 7 Feb 2025 12:39:24 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these calls because of the following error: > > > VLoop::check_preconditions: failed: control flow in loop not allowed > > > The control flow is due to the java implementation for these methods, e.g. > > > public static long max(long a, long b) { > return (a >= b) ? a : b; > } > > > This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. > By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. > E.g. > > > SuperWord::transform_loop: > Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined > 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) > > > Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1155 > long max 1173 > > > After the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1042 > long max 1042 > > > This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. > Therefore, it still relies on the macro expansion to transform those into CMoveL. > > I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0 >>> jtreg:test/jdk:tier1 ... Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 44 additional commits since the last revision: - Merge branch 'master' into topic.intrinsify-max-min-long - Fix typo - Renaming methods and variables and add docu on algorithms - Fix copyright years - Make sure it runs with cpus with either avx512 or asimd - Test can only run with 256 bit registers or bigger * Remove platform dependant check and use platform independent configuration instead. - Fix license header - Tests should also run on aarch64 asimd=true envs - Added comment around the assertions - Adjust min/max identity IR test expectations after changes - ... and 34 more: https://git.openjdk.org/jdk/compare/f56622ff...a190ae68 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20098/files - new: https://git.openjdk.org/jdk/pull/20098/files/724a346a..a190ae68 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=10-11 Stats: 206462 lines in 5108 files changed: 101636 ins; 84099 del; 20727 mod Patch: https://git.openjdk.org/jdk/pull/20098.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20098/head:pull/20098 PR: https://git.openjdk.org/jdk/pull/20098 From nbenalla at openjdk.org Fri Feb 7 13:11:27 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Fri, 7 Feb 2025 13:11:27 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v3] In-Reply-To: References: Message-ID: > Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. > It scans files in those directories, filtering out certain files as well as all `.c`, `.java` and `.jar` files in test sources. > > Before adding line 86 and excluding `os_windows.cpp`, the test failed with: > > > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: > HMODULE hModule = NULL; > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: > GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); > java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. > at TestNoNULL.main(TestNoNULL.java:73) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > at java.base/java.lang.Thread.run(Thread.java:1447) Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: revert to the original regex and remove the exclusion of os_windows.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23466/files - new: https://git.openjdk.org/jdk/pull/23466/files/ae3c9eba..3b8b05d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23466&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23466&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23466.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23466/head:pull/23466 PR: https://git.openjdk.org/jdk/pull/23466 From nbenalla at openjdk.org Fri Feb 7 13:11:27 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Fri, 7 Feb 2025 13:11:27 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v2] In-Reply-To: References: <5SUTxzDb_jOFp4iB1-utmXIu-osA0-r5LaYwixoL_qk=.ee94c3d6-7ba2-4012-ab1b-3d6a0113d1ed@github.com> Message-ID: On Thu, 6 Feb 2025 21:41:14 GMT, Kim Barrett wrote: >> Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: >> >> update based on feedback > > test/hotspot/jtreg/sources/TestNoNULL.java line 46: > >> 44: private static final Set excludedTestFiles = new HashSet<>(); >> 45: private static final Set excludedTestExtensions = Set.of(".c", ".java", ".jar"); >> 46: private static final Pattern NULL_PATTERN = Pattern.compile("(? > I'm not a regex expert, but I think the earlier version using `\b` should be okay. And it's a > _lot_ clearer than this version, now that I actually know what `\b` means. Sorry for leading > you astray. No problem, I spent some time running additional tests to see if `\b` is enough for our use case. Updated in [3b8b05d](https://github.com/openjdk/jdk/pull/23466/commits/3b8b05d15c747370a51e35265292a16115e88b5e). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1946501429 From sroy at openjdk.org Fri Feb 7 13:50:27 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 7 Feb 2025 13:50:27 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v20] In-Reply-To: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: > JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) > > Currently acceleration code for GHASH is missing for PPC64. > > The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. Suchismith Roy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 42 commits: - Merge branch 'openjdk:master' into ghash_processblocks - adapt Condition registers - Merge branch 'openjdk:master' into ghash_processblocks - restore chnges - restore chnges - permute vHigh,vLow - indentation - comments - vsx logic change - spaces - ... and 32 more: https://git.openjdk.org/jdk/compare/86cec4ea...d22fcf25 ------------- Changes: https://git.openjdk.org/jdk/pull/20235/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=19 Stats: 167 lines in 2 files changed: 163 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20235/head:pull/20235 PR: https://git.openjdk.org/jdk/pull/20235 From stuefe at openjdk.org Fri Feb 7 14:05:13 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 7 Feb 2025 14:05:13 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees [v2] In-Reply-To: <7x7c0D-rmJZPC1DZf_3SGHfbQazkYXvBuMjcgefRfQY=.4fa2c7a2-a4f6-4cde-9997-fae2b175f2db@github.com> References: <7x7c0D-rmJZPC1DZf_3SGHfbQazkYXvBuMjcgefRfQY=.4fa2c7a2-a4f6-4cde-9997-fae2b175f2db@github.com> Message-ID: On Fri, 7 Feb 2025 11:00:09 GMT, Casper Norrbin wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> feedback johan > > src/hotspot/share/utilities/rbTree.hpp line 282: > >> 280: // Returns leftmost node, nullptr if tree is empty. >> 281: // If COMPARATOR::cmp(a, b) behaves canonically (positive value for a > b), this will the smallest key value. >> 282: const RBNode* leftmost() const { > > Just a thought, no change needed: > > The intrusive tree PR has the member `RBNode* _first` to get the leftmost node in constant time instead of having to traverse down the tree, but at the cost of an extra check when inserting/removing. Which solution do you prefer? I don't really have a preference so can adapt that PR either which way. I think iteration is way rarer than insert, so I would save the cycles and the added complexity. Different way to look at it: - for small trees, depth is small, so few hops to the first node, so not much gained my keeping _first - for last trees, the majority of cycles is spent hopping between nodes, not hopping to the first node. Even then, as @jdksjolen remarked, the tree is balanced so the depth is predictable. > test/hotspot/gtest/utilities/test_rbtree.cpp line 402: > >> 400: for (int j = 0; j < 10; j++) { >> 401: if (j == 0) { >> 402: ASSERT_EQ(rbtree_const.leftmost(), (const Node*)nullptr); > > Style: All the previous tests use `EXPECT`s instead of `ASSERT`s. Goes for the other new tests as well. These are different things. ASSERT breaks immediately, EXPECT continues to run. I prefer ASSERT in general, to cut down on error message flood if asserting in a loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1946572485 PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1946574383 From nbenalla at openjdk.org Fri Feb 7 14:39:52 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Fri, 7 Feb 2025 14:39:52 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v3] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 13:11:27 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java` and `.jar` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: > > revert to the original regex and remove the exclusion of os_windows.cpp I've made one last (hopefully trivial) change to filter out `.class` files to be consistent ------------- PR Comment: https://git.openjdk.org/jdk/pull/23466#issuecomment-2643116375 From nbenalla at openjdk.org Fri Feb 7 14:39:52 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Fri, 7 Feb 2025 14:39:52 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v4] In-Reply-To: References: Message-ID: > Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. > It scans files in those directories, filtering out certain files as well as all `.c`, `.java` and `.jar` files in test sources. > > Before adding line 86 and excluding `os_windows.cpp`, the test failed with: > > > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: > HMODULE hModule = NULL; > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: > GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); > java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. > at TestNoNULL.main(TestNoNULL.java:73) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > at java.base/java.lang.Thread.run(Thread.java:1447) Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: trivial change, if .java files are filtered out then so should .class files ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23466/files - new: https://git.openjdk.org/jdk/pull/23466/files/3b8b05d1..71b90d45 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23466&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23466&range=02-03 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23466.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23466/head:pull/23466 PR: https://git.openjdk.org/jdk/pull/23466 From stuefe at openjdk.org Fri Feb 7 15:40:14 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 7 Feb 2025 15:40:14 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees [v2] In-Reply-To: <3DZjBjl2ib-1ZtbJmECaXPj9-a0SF3dmTtziK7Vq3vw=.6897d423-3641-4bb6-9535-9a7768f50153@github.com> References: <3DZjBjl2ib-1ZtbJmECaXPj9-a0SF3dmTtziK7Vq3vw=.6897d423-3641-4bb6-9535-9a7768f50153@github.com> Message-ID: <8DzQmDURsOks59XWsZOtjGcUBUz8u12YMyNXAb9y_oc=.0bca5eaa-0e3c-441a-b383-03cea72fce01@github.com> On Fri, 7 Feb 2025 09:09:04 GMT, Johan Sj?len wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> feedback johan > > src/hotspot/share/utilities/rbTree.hpp line 48: > >> 46: class RBTree { >> 47: friend class RBTreeTest; >> 48: typedef RBTree TreeType; > > I'm referring to this being `TreeType` and not only `Tree`, same with `Node`. Oh I see. In this case, I would prefer the more descriptive "Type", even though the upper case T in Tree indicates that this would probably be a type, not a variable name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1946721956 From stuefe at openjdk.org Fri Feb 7 15:40:14 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 7 Feb 2025 15:40:14 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees [v2] In-Reply-To: <7x7c0D-rmJZPC1DZf_3SGHfbQazkYXvBuMjcgefRfQY=.4fa2c7a2-a4f6-4cde-9997-fae2b175f2db@github.com> References: <7x7c0D-rmJZPC1DZf_3SGHfbQazkYXvBuMjcgefRfQY=.4fa2c7a2-a4f6-4cde-9997-fae2b175f2db@github.com> Message-ID: On Fri, 7 Feb 2025 11:14:00 GMT, Casper Norrbin wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> feedback johan > > test/hotspot/gtest/utilities/test_rbtree.cpp line 424: > >> 422: n = rbtree.leftmost(); >> 423: ASSERT_EQ(n->key(), min); >> 424: n->set_val(1); > > Why are the node's values set to 1? I wanted to have a modifying call on Node*, just to make sure its non-const (paranoid I know) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23486#discussion_r1946726213 From stuefe at openjdk.org Fri Feb 7 15:47:49 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 7 Feb 2025 15:47:49 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees [v3] In-Reply-To: References: Message-ID: > For things I currently work on (compilation memory statistic), I need this functionality. > > Changes: > > - added leftmost() and rightmost() (pretty self-explanatory) > - added print_on(outputStream*) (likewise) > - const correctness > - other minor cleanups > - gtests for all added functions > > Tests: GHA (all clean), manual tests on Linux x64 Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: feedback caspar ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23486/files - new: https://git.openjdk.org/jdk/pull/23486/files/c96cfa35..4f9eddb4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23486&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23486&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23486.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23486/head:pull/23486 PR: https://git.openjdk.org/jdk/pull/23486 From cnorrbin at openjdk.org Fri Feb 7 15:47:49 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Fri, 7 Feb 2025 15:47:49 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees [v3] In-Reply-To: References: Message-ID: <3euV443tQMPQMZBgEQzX36CM6NR7UI4YZPJDU_bYTho=.1cd29c87-fee8-4615-834b-d9462206b336@github.com> On Fri, 7 Feb 2025 15:44:24 GMT, Thomas Stuefe wrote: >> For things I currently work on (compilation memory statistic), I need this functionality. >> >> Changes: >> >> - added leftmost() and rightmost() (pretty self-explanatory) >> - added print_on(outputStream*) (likewise) >> - const correctness >> - other minor cleanups >> - gtests for all added functions >> >> Tests: GHA (all clean), manual tests on Linux x64 > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > feedback caspar Looks good! Thanks for the explanations. ------------- Marked as reviewed by cnorrbin (Author). PR Review: https://git.openjdk.org/jdk/pull/23486#pullrequestreview-2602092052 From stuefe at openjdk.org Fri Feb 7 15:47:49 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 7 Feb 2025 15:47:49 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees [v2] In-Reply-To: References: Message-ID: <-9mjBV1M7UzXdzUoqqrXciibDoAm_rtGaKGYODCkYPw=.fa082507-1865-4734-ac3e-f4106fa86500@github.com> On Fri, 7 Feb 2025 09:14:49 GMT, Johan Sj?len wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> feedback johan > > Thanks, this looks good. Thanks @jdksjolen and @caspernorrbin ! I fed in feedback from Caspar; could you re-confirm, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23486#issuecomment-2643281434 From vlivanov at openjdk.org Fri Feb 7 16:57:27 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Feb 2025 16:57:27 GMT Subject: RFR: 8337251: C1: Improve Class.isInstance intrinsic [v4] In-Reply-To: References: Message-ID: On Mon, 27 Jan 2025 16:13:35 GMT, Andrew Haley wrote: >> This replaces a runtime call to `Runtime1::is_instance_of()` by a platform-dependent C1 intrinsic. >> >> This improves overall performance significantly. and it minimizes icache footprint. >> >> The original commit contains this comment: >> >> >> // TODO could try to substitute this node with an equivalent InstanceOf >> // if clazz is known to be a constant Class. This will pick up newly found >> // constants after HIR construction. I'll leave this to a future change. >> >> >> >> However, there's little performance to be gained by restricting this optimization to constant Class instances, and after this this patch, C1 `Class.isInstance()` compares favorably with the current platform-dependent `instanceof` intrinsic. >> >> It's not strictly necessary for other platforms to implement this optimization. >> >> Performance: >> >> Xeon-E5 2430, before and after:: >> >> >> Benchmark Score Error Score Error Units >> SecondarySupersLookup.testNegative00 11.783 ? 0.491 10.459 ? 0.183 ns/op >> SecondarySupersLookup.testNegative01 11.757 ? 0.127 10.475 ? 0.661 ns/op >> SecondarySupersLookup.testNegative02 11.771 ? 0.700 10.479 ? 0.357 ns/op >> SecondarySupersLookup.testNegative55 23.997 ? 1.816 16.854 ? 1.034 ns/op >> SecondarySupersLookup.testNegative60 29.598 ? 1.326 26.828 ? 0.637 ns/op >> SecondarySupersLookup.testNegative63 74.528 ? 3.157 69.431 ? 0.357 ns/op >> SecondarySupersLookup.testNegative64 75.936 ? 1.805 70.124 ? 0.397 ns/op >> >> SecondarySupersLookup.testPositive01 15.257 ? 1.179 9.722 ? 0.326 ns/op >> SecondarySupersLookup.testPositive02 15.164 ? 1.383 9.737 ? 0.708 ns/op >> SecondarySupersLookup.testPositive03 15.166 ? 0.934 9.726 ? 0.184 ns/op >> SecondarySupersLookup.testPositive40 20.384 ? 0.530 12.805 ? 0.778 ns/op >> SecondarySupersLookup.testPositive50 15.118 ? 0.140 9.735 ? 0.555 ns/op >> SecondarySupersLookup.testPositive60 20.415 ? 3.083 11.603 ? 0.106 ns/op >> SecondarySupersLookup.testPositive63 65.478 ? 8.484 58.507 ? 2.837 ns/op >> SecondarySupersLookup.testPositive64 75.880 ? 1.047 68.667 ? 1.347 ns/op >> >> >> AArch64 (Apple M1) >> >> >> Benchmark Score Error Score Error Units >> SecondarySupersLookup.testNegative00 4.139 ? 0.005 2.815 ? 0.014 ns/op >> SecondarySupersLookup.testNegative01 4.071 ? 0.153 ... > > Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Next > - Next > - Merge branch 'master' into JDK-8337251 > - More > - Next > - Windows fix, maybe. > - Update > - Update > - Test fix/ > - Test fix/ > - ... and 18 more: https://git.openjdk.org/jdk/compare/764d70b7...13a2d93e Testing results (hs-tier1 - hs-tier4) look good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22491#issuecomment-2643475099 From sroy at openjdk.org Fri Feb 7 16:58:15 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 7 Feb 2025 16:58:15 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v19] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: <-Nrzcr1rY6Os3DyzZdAMltyDGWmdBYqPhneFzFIhkDM=.a294ffb8-bcda-4c04-804b-3313b179d6b6@github.com> On Wed, 5 Feb 2025 14:42:17 GMT, Martin Doerr wrote: >> Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: >> >> adapt Condition registers > > src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 655: > >> 653: // https://web.archive.org/web/20110609115824/https://software.intel.com/file/24918 >> 654: // >> 655: Label loop; > > Please try if aligning the loop entry improves performance. I'd insert `__ align(32);` here. This is not improving performance @TheRealMDoerr > src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 658: > >> 656: __ bind(loop); >> 657: __ vspltisb(vZero, 0); >> 658: __ li(temp1, 0); > > I don't think these instructions should be inside of the loop. vspltisb(vZero,0) is needed. __ vsldoi(vTmp8, vTmp5, vZero, 8); // mL : Extract the lower 64 bits of M __ vsldoi(vTmp9, vZero, vTmp5, 8); // mH : Extract the higher 64 bits of M We need to extract appropriate bits and for that vZero needs to be initialised to 0 always. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1946858131 PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1946859665 From nbenalla at openjdk.org Fri Feb 7 17:10:59 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Fri, 7 Feb 2025 17:10:59 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v5] In-Reply-To: References: Message-ID: > Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. > It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. > > Before adding line 86 and excluding `os_windows.cpp`, the test failed with: > > > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: > HMODULE hModule = NULL; > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: > GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); > java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. > at TestNoNULL.main(TestNoNULL.java:73) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > at java.base/java.lang.Thread.run(Thread.java:1447) Nizar Benalla has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - filter out `.zip` files - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot - trivial change, if .java files are filtered out then so should .class files - revert to the original regex and remove the exclusion of os_windows.cpp - update based on feedback - Add a test to prevent NULL backsliding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23466/files - new: https://git.openjdk.org/jdk/pull/23466/files/71b90d45..99577488 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23466&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23466&range=03-04 Stats: 8237 lines in 406 files changed: 2204 ins; 4192 del; 1841 mod Patch: https://git.openjdk.org/jdk/pull/23466.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23466/head:pull/23466 PR: https://git.openjdk.org/jdk/pull/23466 From aph at openjdk.org Fri Feb 7 17:42:23 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 7 Feb 2025 17:42:23 GMT Subject: Integrated: 8337251: C1: Improve Class.isInstance intrinsic In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 17:16:22 GMT, Andrew Haley wrote: > This replaces a runtime call to `Runtime1::is_instance_of()` by a platform-dependent C1 intrinsic. > > This improves overall performance significantly. and it minimizes icache footprint. > > The original commit contains this comment: > > > // TODO could try to substitute this node with an equivalent InstanceOf > // if clazz is known to be a constant Class. This will pick up newly found > // constants after HIR construction. I'll leave this to a future change. > > > > However, there's little performance to be gained by restricting this optimization to constant Class instances, and after this this patch, C1 `Class.isInstance()` compares favorably with the current platform-dependent `instanceof` intrinsic. > > It's not strictly necessary for other platforms to implement this optimization. > > Performance: > > Xeon-E5 2430, before and after:: > > > Benchmark Score Error Score Error Units > SecondarySupersLookup.testNegative00 11.783 ? 0.491 10.459 ? 0.183 ns/op > SecondarySupersLookup.testNegative01 11.757 ? 0.127 10.475 ? 0.661 ns/op > SecondarySupersLookup.testNegative02 11.771 ? 0.700 10.479 ? 0.357 ns/op > SecondarySupersLookup.testNegative55 23.997 ? 1.816 16.854 ? 1.034 ns/op > SecondarySupersLookup.testNegative60 29.598 ? 1.326 26.828 ? 0.637 ns/op > SecondarySupersLookup.testNegative63 74.528 ? 3.157 69.431 ? 0.357 ns/op > SecondarySupersLookup.testNegative64 75.936 ? 1.805 70.124 ? 0.397 ns/op > > SecondarySupersLookup.testPositive01 15.257 ? 1.179 9.722 ? 0.326 ns/op > SecondarySupersLookup.testPositive02 15.164 ? 1.383 9.737 ? 0.708 ns/op > SecondarySupersLookup.testPositive03 15.166 ? 0.934 9.726 ? 0.184 ns/op > SecondarySupersLookup.testPositive40 20.384 ? 0.530 12.805 ? 0.778 ns/op > SecondarySupersLookup.testPositive50 15.118 ? 0.140 9.735 ? 0.555 ns/op > SecondarySupersLookup.testPositive60 20.415 ? 3.083 11.603 ? 0.106 ns/op > SecondarySupersLookup.testPositive63 65.478 ? 8.484 58.507 ? 2.837 ns/op > SecondarySupersLookup.testPositive64 75.880 ? 1.047 68.667 ? 1.347 ns/op > > > AArch64 (Apple M1) > > > Benchmark Score Error Score Error Units > SecondarySupersLookup.testNegative00 4.139 ? 0.005 2.815 ? 0.014 ns/op > SecondarySupersLookup.testNegative01 4.071 ? 0.153 2.826 ? 0.291 ns/op > SecondarySupersLookup.testNegative02 4.089 ? 0.752 2.817 ? 0.028 ns/... This pull request has now been integrated. Changeset: b40f8eef Author: Andrew Haley URL: https://git.openjdk.org/jdk/commit/b40f8eef98dac066816d4d548b2304276a76d5e0 Stats: 148 lines in 13 files changed: 139 ins; 7 del; 2 mod 8337251: C1: Improve Class.isInstance intrinsic Reviewed-by: vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/22491 From sviswanathan at openjdk.org Fri Feb 7 17:53:16 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 7 Feb 2025 17:53:16 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v5] In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 21:43:56 GMT, Volodymyr Paprotski wrote: >> (Also see `8319429: Resetting MXCSR flags degrades ecore`) >> >> This PR fixes two issues: >> - the original issue is a crash caused by `__ warn` corrupting the stack on Windows only >> - This issue also uncovered that -Xcheck:jni test cases were getting 65k lines of warning on HelloWorld (on both Linux _and_ windows): >> >> OpenJDK 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall >> >> >> First, the crash. Caused when FXRSTOR is attempting to write reserved bits into MXCSR. If those bits happen to be set, crash. (Hence the crash isn't deterministic. But frequent enough if `__ warn` is used). It is caused by the binding not reserving stack space for register parameters () >> ![image](https://github.com/user-attachments/assets/4ad63908-088b-4e9d-9e7d-a3509bee046a) >> Prolog of the warn function then proceeds to store the for arg registers onto the stack, overriding the fxstore save area. (See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#calling-convention-defaults) >> >> Fix uses `frame::arg_reg_save_area_bytes` to bump the stack pointer. >> >> --- >> >> I also kept the fix to `verify_mxcsr` since without it, `-Xcheck:jni` is practically unusable when `-XX:+EnableX86ECoreOpts` are set (65k+ lines of warnings) > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > typo src/hotspot/cpu/x86/macroAssembler_x86.cpp line 2393: > 2391: stmxcsr(mxcsr_save); > 2392: movl(tmp, mxcsr_save); > 2393: // Mask out any pending exceptions (only check control and mask bits) This comment should go on the else path and could be changed to "Mask out status bits (only check control and mask bits)" src/hotspot/cpu/x86/macroAssembler_x86.cpp line 2396: > 2394: if (EnableX86ECoreOpts) { > 2395: // On Ecore, status bits are set by default (for performance) > 2396: orl(tmp, 0x003f); // On Ecore, exception bits are set by default Duplication in comment. Comment could be modified to reflect something like "The mxcsr_std has status bits set for performance on ECore" src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp line 183: > 181: jint MxCsr = INITIAL_MXCSR; // set to 0x1f80` in winnt.h > 182: if (EnableX86ECoreOpts) { > 183: // On ECore, restore with signaling flags enabled Change comment to // On ECore restore with status bits enabled. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22673#discussion_r1946911130 PR Review Comment: https://git.openjdk.org/jdk/pull/22673#discussion_r1946915061 PR Review Comment: https://git.openjdk.org/jdk/pull/22673#discussion_r1946934759 From fmatte at openjdk.org Fri Feb 7 18:12:53 2025 From: fmatte at openjdk.org (Fairoz Matte) Date: Fri, 7 Feb 2025 18:12:53 GMT Subject: RFR: 8347833: CrashOnOutOfMemory should stop GC threads before HeapDumpOnOutOfMemoryError Message-ID: When CrashOnOutOfMemory and HeapDumpOnOutOfMemoryError invoked together, we should make sure, it is performed in a single safepoint, this will avoid allowing other threads to run and throw OOM errors after the initial one is already under error logging. ------------- Commit messages: - 8347833: CrashOnOutOfMemory should stop GC threads before HeapDumpOnOutOfMemoryError Changes: https://git.openjdk.org/jdk/pull/23519/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23519&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347833 Stats: 85 lines in 4 files changed: 72 ins; 8 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23519/head:pull/23519 PR: https://git.openjdk.org/jdk/pull/23519 From gziemski at openjdk.org Fri Feb 7 18:42:44 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 7 Feb 2025 18:42:44 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v18] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: do not pollute Hotspot thread code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/6302a300..fe36a9ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=16-17 Stats: 56 lines in 4 files changed: 22 ins; 7 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From coleenp at openjdk.org Fri Feb 7 19:16:13 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 7 Feb 2025 19:16:13 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v7] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Fri, 7 Feb 2025 12:34:40 GMT, Coleen Phillimore wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix jvmci test. I added some code to hide the Class.modifiers field and fixed the JVMCI test. Please re-review. Also @iwanowww I think the intrinsic for isInterface can be removed and just be Java code like: public boolean isInterface() { return getModifiers().isInterface(); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/22652#issuecomment-2643799984 From liach at openjdk.org Fri Feb 7 19:31:10 2025 From: liach at openjdk.org (Chen Liang) Date: Fri, 7 Feb 2025 19:31:10 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v2] In-Reply-To: References: <5SUTxzDb_jOFp4iB1-utmXIu-osA0-r5LaYwixoL_qk=.ee94c3d6-7ba2-4012-ab1b-3d6a0113d1ed@github.com> Message-ID: On Thu, 6 Feb 2025 12:04:02 GMT, Nizar Benalla wrote: >> test/hotspot/jtreg/sources/TestNoNULL.java line 56: >> >>> 54: } >>> 55: >>> 56: if (dir == null) { >> >> @sormuras Do you know if the source directory (or directories) of the JDK is passed to jtreg at all? The current approach seems a bit hacky. > > Copy-pasting this comment from the JBS issue > >> We have the full source available when jtreg tests are run in our internal systems. The same should be true in GHA, as well as when developers run tests locally. That doesn't imply the current approach is the best approach to access the source; it only states that the access is *possible*. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1947074333 From stuefe at openjdk.org Fri Feb 7 19:37:10 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 7 Feb 2025 19:37:10 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 09:14:49 GMT, Johan Sj?len wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> feedback johan > > Thanks, this looks good. @jdksjolen can you re-approve? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23486#issuecomment-2643879381 From vlivanov at openjdk.org Fri Feb 7 19:47:12 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Feb 2025 19:47:12 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v7] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Fri, 7 Feb 2025 12:34:40 GMT, Coleen Phillimore wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix jvmci test. Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22652#pullrequestreview-2602686659 From vlivanov at openjdk.org Fri Feb 7 20:01:27 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 7 Feb 2025 20:01:27 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v7] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Fri, 7 Feb 2025 19:13:07 GMT, Coleen Phillimore wrote: > I think the intrinsic for isInterface can be removed Good point. Moreover, it seems most of intrinsics on Class queries can be replaced with a flag bit check on the mirror. (Do we have 16 unused bits in Class::modifiers after this change?) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22652#issuecomment-2643997479 From gziemski at openjdk.org Fri Feb 7 20:02:58 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 7 Feb 2025 20:02:58 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v19] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: clean up report ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/fe36a9ca..5039be78 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=17-18 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From ccheung at openjdk.org Fri Feb 7 20:27:49 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 7 Feb 2025 20:27:49 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks Message-ID: This changset refactors CDS class paths and module paths validation code into a new class `AOTCodeSource` and related class `AOTCodeSourceConfig`. Code has been moved from filemap.[c|h]pp, classLoader.[c|h]pp, and classLoaderExt.[c|h]pp to aotCodeSource.[c|h]pp. CDS dependencies have been removed from `classLoader.cpp`. More refactoring could be done, such as removing `classLoaderExt.cpp`, in a future RFE. Passed tiers 1 - 5 testing. ------------- Commit messages: - trailing whitespace - cleanup and add comments - 8280682: Refactor AOT code source validation checks Changes: https://git.openjdk.org/jdk/pull/23476/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23476&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8280682 Stats: 3172 lines in 40 files changed: 1346 ins; 1647 del; 179 mod Patch: https://git.openjdk.org/jdk/pull/23476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23476/head:pull/23476 PR: https://git.openjdk.org/jdk/pull/23476 From coleenp at openjdk.org Fri Feb 7 21:14:13 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 7 Feb 2025 21:14:13 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v7] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Fri, 7 Feb 2025 19:58:12 GMT, Vladimir Ivanov wrote: > Good point. Moreover, it seems most of intrinsics on Class queries can be replaced with a flag bit check on the mirror. (Do we have 16 unused bits in Class::modifiers after this change?) Yes, I think so. isArray and isPrimitive definitely. We could first change the modifiers field to "char" because that's its size and then have two booleans for each of these. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22652#issuecomment-2644136904 From liach at openjdk.org Fri Feb 7 21:37:12 2025 From: liach at openjdk.org (Chen Liang) Date: Fri, 7 Feb 2025 21:37:12 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v7] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Fri, 7 Feb 2025 12:34:40 GMT, Coleen Phillimore wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix jvmci test. Making `isArray` and `isPrimitive` Java-based is going to be helpful for the interpreter performance of these methods in early bootstrap. ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22652#issuecomment-2644171713 From jsjolen at openjdk.org Fri Feb 7 22:04:11 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Feb 2025 22:04:11 GMT Subject: RFR: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees [v3] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 15:47:49 GMT, Thomas Stuefe wrote: >> For things I currently work on (compilation memory statistic), I need this functionality. >> >> Changes: >> >> - added leftmost() and rightmost() (pretty self-explanatory) >> - added print_on(outputStream*) (likewise) >> - const correctness >> - other minor cleanups >> - gtests for all added functions >> >> Tests: GHA (all clean), manual tests on Linux x64 > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > feedback caspar LGTM ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23486#pullrequestreview-2602941153 From gziemski at openjdk.org Fri Feb 7 22:09:49 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 7 Feb 2025 22:09:49 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v20] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: use env variables instead of runtime flags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/5039be78..53646dbe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=18-19 Stats: 45 lines in 4 files changed: 18 ins; 16 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From gziemski at openjdk.org Fri Feb 7 22:13:06 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 7 Feb 2025 22:13:06 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v21] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: update comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/53646dbe..f3715a35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=19-20 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From gziemski at openjdk.org Fri Feb 7 22:16:58 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 7 Feb 2025 22:16:58 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v22] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: remove unused header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/f3715a35..fb3f757f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=20-21 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From jiangli at openjdk.org Fri Feb 7 23:55:43 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 7 Feb 2025 23:55:43 GMT Subject: RFR: 8349620: Add VMProps for static JDK Message-ID: Please review this change that adds the `jdk.static` VMProps. It can be used to skip tests not for running on static JDK. This also adds a new WhiteBox native method, `jdk.test.whitebox.WhiteBox.isStatic()`, which is used by VMProps to determine if it's static at runtime. `@requires !jdk.static` is added in `test/hotspot/jtreg/runtime/modules/ModulesSymLink.java` to skip running the test on static JDK. This test uses `bin/jlink`, which is not provided on static JDK. There are other tests that require tools in `bin/`. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. ------------- Commit messages: - - Add 'jdk.static' in VMProps. It can be used to skip tests not for running on static JDK. Changes: https://git.openjdk.org/jdk/pull/23528/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23528&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349620 Stats: 19 lines in 6 files changed: 16 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23528.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23528/head:pull/23528 PR: https://git.openjdk.org/jdk/pull/23528 From kbarrett at openjdk.org Sat Feb 8 00:21:11 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 8 Feb 2025 00:21:11 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v5] In-Reply-To: References: Message-ID: <7xRWmcrmVSHY1gAjsLf4mR6f13h_hQDNDNRB91lm_M8=.f6180778-ac23-46b4-a943-6c1092e7eb49@github.com> On Fri, 7 Feb 2025 17:10:59 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - filter out `.zip` files > - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot > - trivial change, if .java files are filtered out then so should .class files > - revert to the original regex and remove the exclusion of os_windows.cpp > - update based on feedback > - Add a test to prevent NULL backsliding Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23466#pullrequestreview-2603110520 From kbarrett at openjdk.org Sat Feb 8 00:21:11 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 8 Feb 2025 00:21:11 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v2] In-Reply-To: References: <5SUTxzDb_jOFp4iB1-utmXIu-osA0-r5LaYwixoL_qk=.ee94c3d6-7ba2-4012-ab1b-3d6a0113d1ed@github.com> Message-ID: On Fri, 7 Feb 2025 19:28:18 GMT, Chen Liang wrote: >> Copy-pasting this comment from the JBS issue >> >>> We have the full source available when jtreg tests are run in our internal systems. The same should be true in GHA, as well as when developers run tests locally. > > That doesn't imply the current approach is the best approach to access the source; it only states that the access is *possible*. I don't see any better way. https://openjdk.org/jtreg/vmoptions.html doesn't list anything other than the "test.src" property as a way to find the source hierarchy. If you know of an alternative, please suggest. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1947346451 From stuefe at openjdk.org Sat Feb 8 06:38:15 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 8 Feb 2025 06:38:15 GMT Subject: Integrated: 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 08:06:04 GMT, Thomas Stuefe wrote: > For things I currently work on (compilation memory statistic), I need this functionality. > > Changes: > > - added leftmost() and rightmost() (pretty self-explanatory) > - added print_on(outputStream*) (likewise) > - const correctness > - other minor cleanups > - gtests for all added functions > > Tests: GHA (all clean), manual tests on Linux x64 This pull request has now been integrated. Changeset: 7d52f1e6 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/7d52f1e64d17d4a77dacc6074ead11e975eed9eb Stats: 262 lines in 3 files changed: 187 ins; 9 del; 66 mod 8349525: RBTree: provide leftmost, rightmost, and a simple way to print trees Reviewed-by: jsjolen, cnorrbin ------------- PR: https://git.openjdk.org/jdk/pull/23486 From mdoerr at openjdk.org Sat Feb 8 12:03:14 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 8 Feb 2025 12:03:14 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v19] In-Reply-To: <-Nrzcr1rY6Os3DyzZdAMltyDGWmdBYqPhneFzFIhkDM=.a294ffb8-bcda-4c04-804b-3313b179d6b6@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> <-Nrzcr1rY6Os3DyzZdAMltyDGWmdBYqPhneFzFIhkDM=.a294ffb8-bcda-4c04-804b-3313b179d6b6@github.com> Message-ID: <3h9GAveTJ2kDTw97K5tLV_Sg6i0_2aIE-dmxYF6ZoO0=.3894b29c-3616-4aca-9d9a-6c4e947e7658@github.com> On Fri, 7 Feb 2025 16:54:14 GMT, Suchismith Roy wrote: >> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 655: >> >>> 653: // https://web.archive.org/web/20110609115824/https://software.intel.com/file/24918 >>> 654: // >>> 655: Label loop; >> >> Please try if aligning the loop entry improves performance. I'd insert `__ align(32);` here. > > This is not improving performance @TheRealMDoerr It seems to be faster on my Power9 machine. But we should check again after everything else is done. >> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 658: >> >>> 656: __ bind(loop); >>> 657: __ vspltisb(vZero, 0); >>> 658: __ li(temp1, 0); >> >> I don't think these instructions should be inside of the loop. > > vspltisb(vZero,0) is needed. > __ vsldoi(vTmp8, vTmp5, vZero, 8); // mL : Extract the lower 64 bits of M > __ vsldoi(vTmp9, vZero, vTmp5, 8); // mH : Extract the higher 64 bits of M > We need to extract appropriate bits and for that vZero needs to be initialised to 0 always. The problem is that you're overwriting it below which should not be done: __ vxor(vZero, vTmp4, vTmp10); __ vmr(vState, vZero); Why not `__ vxor(vState, vTmp4, vTmp10);`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1947690718 PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1947693146 From mdoerr at openjdk.org Sat Feb 8 12:27:14 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 8 Feb 2025 12:27:14 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v20] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Fri, 7 Feb 2025 13:50:27 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 42 commits: > > - Merge branch 'openjdk:master' into ghash_processblocks > - adapt Condition registers > - Merge branch 'openjdk:master' into ghash_processblocks > - restore chnges > - restore chnges > - permute vHigh,vLow > - indentation > - comments > - vsx logic change > - spaces > - ... and 32 more: https://git.openjdk.org/jdk/compare/86cec4ea...d22fcf25 src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 661: > 659: __ andi(temp1, data, 15); > 660: __ cmpwi(CR0, temp1, 0); > 661: __ beq(CR0, L_aligned); // Check if address is aligned (mask lower 4 bits) The alignment check should not be in the loop. Better check before and use 2 loops. It would be interesting to know how often the data is aligned. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1947715475 From duke at openjdk.org Sat Feb 8 16:03:19 2025 From: duke at openjdk.org (duke) Date: Sat, 8 Feb 2025 16:03:19 GMT Subject: Withdrawn: 8344880: AArch64: Add compile time check for class offsets In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 23:57:41 GMT, Chad Rakoczy wrote: > [JDK-8344880](https://bugs.openjdk.org/browse/JDK-8344880) > > Adds compile time checks for str/ldr instructions to verify that the immediate offset will fit. This adds static_assert for constant offsets that are checked at compile time. The macro offset_of is not constexpr so instead the class size is checked. If the size of a class fits into a memory instructions then any offset in it will fit. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/22623 From alanb at openjdk.org Sat Feb 8 19:44:12 2025 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 8 Feb 2025 19:44:12 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v7] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Fri, 7 Feb 2025 12:34:40 GMT, Coleen Phillimore wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix jvmci test. No more comments from me. ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22652#pullrequestreview-2604014387 From duke at openjdk.org Sun Feb 9 01:34:27 2025 From: duke at openjdk.org (duke) Date: Sun, 9 Feb 2025 01:34:27 GMT Subject: Withdrawn: 8342035: jlink plugins for setting java.vendor, java.vm.vendor and java.vendor.url In-Reply-To: <0Ic5OQ8ME3gwmLWiavfeT2RIqaZ9Nl4QOpUAEfoEIos=.c696040e-223d-4295-b735-fba6b4d6401e@github.com> References: <0Ic5OQ8ME3gwmLWiavfeT2RIqaZ9Nl4QOpUAEfoEIos=.c696040e-223d-4295-b735-fba6b4d6401e@github.com> Message-ID: <5eu5ddMQC1dKl9tRJH11Y-uOMrUutke0t4xWR7rwVxc=.53126589-5b82-4aa7-aaf2-a25df13dc2f3@github.com> On Thu, 7 Nov 2024 21:38:28 GMT, Henry Jen wrote: > Add jlink plugins to allow branding change for java.vendor, java.vm.vendor and java.vendor.url. > > The jlink plugin will change the value in java.lang.VersionProps, which will set those property values. The `java.vm.vendor` was initialized by VM with value set at build time, and then later be replaced with value from VersionProps. > > To keep current behavior, we treat 'N/A' value as no-op to mimic current build behavior. Perhaps we don't really need this, as proper value should be set with `branding.conf` in official build. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21964 From aturbanov at openjdk.org Sun Feb 9 15:13:16 2025 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Sun, 9 Feb 2025 15:13:16 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v5] In-Reply-To: References: Message-ID: <4DnIP36qYMSM-9Q3gkuyxjwbCg2HXeYCp41q-VEiUUI=.57e9d448-171d-45d5-9484-3ab46b209028@github.com> On Fri, 7 Feb 2025 17:10:59 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - filter out `.zip` files > - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot > - trivial change, if .java files are filtered out then so should .class files > - revert to the original regex and remove the exclusion of os_windows.cpp > - update based on feedback > - Add a test to prevent NULL backsliding test/hotspot/jtreg/sources/TestNoNULL.java line 67: > 65: processFiles(srcPath, excludedSourceFiles, false); > 66: } > 67: processFiles(testPath, excludedTestFiles, true); I think it would clearer to pass not the boolean flag, but Set `excludedTestExtensions` itself. And pass empty set for `srcPath`. if (Files.exists(srcPath)) { processFiles(srcPath, excludedSourceFiles, Set.of()); } processFiles(testPath, excludedTestFiles, excludedTestExtensions); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1948129178 From kvn at openjdk.org Sun Feb 9 17:49:40 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 9 Feb 2025 17:49:40 GMT Subject: RFR: 8349088: De-virtualize Codeblob and nmethod Message-ID: <0PvzE8go0Q4VGhH_OF3OyPPgoD4qhxxfBgSHe41chBU=.e471b2a1-96ad-490d-b3d0-a050bd00d7d8@github.com> Remove virtual methods from CodeBlob and nmethod to simplify saving/restoring in Leyden AOT cache. It avoids the need to patch hidden VPTR pointer to class's virtual table. Added C++ static asserts to make sure no virtual methods are added in a future. Fixed/cleaned SA code which process CodeBlob and its subclasses. Use `CodeBlob::_kind` field value to determine the type of blob. Tested tier1-5, hs-tier6-rt (for JFR testing), stress, xcomp ------------- Commit messages: - 8349088: De-virtualize Codeblob and nmethod Changes: https://git.openjdk.org/jdk/pull/23533/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23533&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349088 Stats: 518 lines in 23 files changed: 235 ins; 215 del; 68 mod Patch: https://git.openjdk.org/jdk/pull/23533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23533/head:pull/23533 PR: https://git.openjdk.org/jdk/pull/23533 From amitkumar at openjdk.org Mon Feb 10 02:33:28 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 10 Feb 2025 02:33:28 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic Message-ID: s390x implementation for Class.isInstance intrinsic. Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. Benchmark results will be updated soon. ------------- Commit messages: - is_instance_of_id intrinsification for s390 Changes: https://git.openjdk.org/jdk/pull/23535/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349686 Stats: 87 lines in 3 files changed: 79 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23535/head:pull/23535 PR: https://git.openjdk.org/jdk/pull/23535 From duke at openjdk.org Mon Feb 10 05:29:25 2025 From: duke at openjdk.org (duke) Date: Mon, 10 Feb 2025 05:29:25 GMT Subject: Withdrawn: 8344169: RISC-V: Use more meaningful frame::metadata_words where possible In-Reply-To: <9XNv_0zgwAVjUaUjVSxFFkwvzUFp3f2Lu8MvPyFv8l8=.d6d9457a-e737-4169-a198-7ea88cfde0db@github.com> References: <9XNv_0zgwAVjUaUjVSxFFkwvzUFp3f2Lu8MvPyFv8l8=.d6d9457a-e737-4169-a198-7ea88cfde0db@github.com> Message-ID: On Thu, 14 Nov 2024 07:00:55 GMT, Fei Yang wrote: > Hello, please review this RISC-V specific change which improves code readability. > > Some background to help understand. We have following frame enumerations in file frame_riscv.hpp: > > enum { > link_offset = -2, > return_addr_offset = -1, > sender_sp_offset = 0 > }; > > The values are compatible with the platform ABI and are different from other platforms like x64 and aarch64. Especially, `sender_sp_offset` is 0 for RISC-V compared to 2 for x64 and aarch64. As a result, there exists some differences in places where code calculates fp through offseting pointer sp by value `sender_sp_offset`. For RISC-V, we need to use constant number 2 instead of `sender_sp_offset` as the pointer offset. But the code will be more readable if we use `frame::metadata_words` which has the same value. This change would not affect correctness or functionality in theory. > > Testing on linux-riscv64: > - [x] hotspot:tier1 (release) > - [x] hotspot_loom & jdk_loom (release & fastdebug) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/22096 From jbhateja at openjdk.org Mon Feb 10 05:33:25 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 10 Feb 2025 05:33:25 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v17] In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 10:05:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 9. X86 backend implementation for all supported intrinsics. >> 10. Functional and Performance validation tests. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Fixing typos Hi @PaulSandoz , Kindly let us know if this is good for integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22754#issuecomment-2646957788 From alanb at openjdk.org Mon Feb 10 08:24:10 2025 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 10 Feb 2025 08:24:10 GMT Subject: RFR: 8349620: Add VMProps for static JDK In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 23:51:41 GMT, Jiangli Zhou wrote: > Please review this change that adds the `jdk.static` VMProps. It can be used to skip tests not for running on static JDK. > > This also adds a new WhiteBox native method, `jdk.test.whitebox.WhiteBox.isStatic()`, which is used by VMProps to determine if it's static at runtime. > > `@requires !jdk.static` is added in `test/hotspot/jtreg/runtime/modules/ModulesSymLink.java` to skip running the test on static JDK. This test uses `bin/jlink`, which is not provided on static JDK. There are other tests that require tools in `bin/`. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. I think this looks okay, I'm just wondering is one property is enough to cover all the configurations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2647241299 From galder at openjdk.org Mon Feb 10 09:29:20 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 10 Feb 2025 09:29:20 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> Message-ID: On Fri, 7 Feb 2025 12:27:42 GMT, Galder Zamarre?o wrote: > At 100% probability baseline fails to vectorize because it observes a control flow. This control flow is not the one you see in min/max implementations, but this is one added by HotSpot as a result of the JIT profiling. It observes that one branch is always taken so it optimizes for that, and adds a branch for the uncommon case where the branch is not taken. I've dug further into this to try to understand how the baseline hotspot code works, and the explanation above is not entirely correct. Let's look at the IR differences between say 100% vs 80% branch situations. At branch 80% you see: 1115 CountedLoop === 1115 598 463 [[ 1101 1115 1116 1118 451 594 ]] inner stride: 2 main of N1115 strip mined !orig=[599],[590],[307] !jvms: MinMaxVector::longLoopMax @ bci:10 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) 692 LoadL === 1083 1101 393 [[ 747 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; #long (does not depend only on test, unknown control) !orig=[395] !jvms: MinMaxVector::longLoopMax @ bci:26 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) 651 LoadL === 1095 1101 355 [[ 747 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; #long (does not depend only on test, unknown control) !orig=[357] !jvms: MinMaxVector::longLoopMax @ bci:20 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) 747 MaxL === _ 651 692 [[ 451 ]] !orig=[608],[416] !jvms: Math::max @ bci:11 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) 451 StoreL === 1115 1101 449 747 [[ 1116 454 911 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=9; !orig=1124 !jvms: MinMaxVector::longLoopMax @ bci:30 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) 594 CountedLoopEnd === 1115 593 [[ 1123 463 ]] [lt] P=0.999731, C=780799.000000 !orig=[462] !jvms: MinMaxVector::longLoopMax @ bci:7 (line 235) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) You see the counted loop with the LoadL for array loads and MaxL consuming those. The StoreL is for array assignment (I think). At branch 100% you see: 650 LoadL === 1105 1119 355 [[ 416 408 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; #long (does not depend only on test, unknown control) !orig=[357] !jvms: MinMaxVector::longLoopMax @ bci:20 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) 691 LoadL === 1093 1119 393 [[ 416 408 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; #long (does not depend only on test, unknown control) !orig=[395] !jvms: MinMaxVector::longLoopMax @ bci:26 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) 408 CmpL === _ 650 691 [[ 409 ]] !jvms: Math::max @ bci:3 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) 409 Bool === _ 408 [[ 410 ]] [lt] !jvms: Math::max @ bci:3 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) 410 If === 1132 409 [[ 411 412 ]] P=0.019892, C=79127.000000 !jvms: Math::max @ bci:3 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) 411 IfTrue === 410 [[ 415 ]] #1 !jvms: Math::max @ bci:3 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) 412 IfFalse === 410 [[ 415 ]] #0 !jvms: Math::max @ bci:3 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) 415 Region === 415 411 412 [[ 415 594 416 451 ]] !orig=[423] !jvms: Math::max @ bci:11 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) 594 CountedLoopEnd === 415 593 [[ 1139 463 ]] [lt] P=0.999683, C=706030.000000 !orig=[462] !jvms: MinMaxVector::longLoopMax @ bci:7 (line 235) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) You see a region within the counted loop with the if/else which belongs to the actual `Math.max` implementation, with the corresponding CmpL and the LoadL nodes for retrieving the longs from the arrays. What causes the difference? It's this section in `PhaseIdealLoop::conditional_move`: ```c++ // Check for highly predictable branch. No point in CMOV'ing if // we are going to predict accurately all the time. if (C->use_cmove() && (cmp_op == Op_CmpF || cmp_op == Op_CmpD)) { //keep going } else if (iff->_prob < infrequent_prob || iff->_prob > (1.0f - infrequent_prob)) return nullptr; At branch 100 `iff->_prob > (1.0f - infrequent_prob)` becomes true and no CMoveL is created so hotspot seems to stick to the original bytecode implementation of `Math.max`. At branch 80 that comparison is below and CMoveL is created, which eventually gets converted into a MaxL node and vectorization kicks in. The numbers are interesting. `infrequent_prob` appears to be a fixed number `0.181818187` and `1.0f` minus that is `0.818181812`. So, at branch 100 `iff->_prob` is `0.906792104` therefore higher than `0.818181812`, and at branch 80 `0.718619287`. I would have expected those `iff->_prob` to be closer to the branch % targets I set, but ignoring that, seems like ~90% would be the cut off. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2647410266 From dholmes at openjdk.org Mon Feb 10 10:12:11 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 10 Feb 2025 10:12:11 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v4] In-Reply-To: References: <6TuFE5mx8jXm2donAE_cM3I5UXCaB1eKrpCyp7qk0wM=.1c585567-cf27-4d3e-bca9-4aa7a556942c@github.com> Message-ID: On Thu, 6 Feb 2025 12:12:59 GMT, Coleen Phillimore wrote: >> I am still missing what can actually set a PD here, sorry. ?? > > Because the field is final, it has to be initialized in the constructor in Java code. My initial patch for modifiers chose to initialize to zero but that's not quite correct. The constructor cannot be called nor can it be made accessible with setAccessible(). So the constructor for java.lang.Class is essentially the Hotspot code JavaClasses::create_mirror(). This is where the PD is assigned. Okay so this pattern of assigning the final fields in the constructor that is never called is just for appeasing the javac compiler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1948769808 From yzheng at openjdk.org Mon Feb 10 10:17:15 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 10 Feb 2025 10:17:15 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v7] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Fri, 7 Feb 2025 12:34:40 GMT, Coleen Phillimore wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix jvmci test. JVMCI change looks good to me ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/22652#pullrequestreview-2605295926 From cstein at openjdk.org Mon Feb 10 10:40:10 2025 From: cstein at openjdk.org (Christian Stein) Date: Mon, 10 Feb 2025 10:40:10 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v2] In-Reply-To: References: <5SUTxzDb_jOFp4iB1-utmXIu-osA0-r5LaYwixoL_qk=.ee94c3d6-7ba2-4012-ab1b-3d6a0113d1ed@github.com> Message-ID: <46sqVAK7loCob1A9LcFs63SoiqaBwIwq1v9D_uqYtdY=.ac6528d5-9628-40f7-9a6c-471abee78757@github.com> On Sat, 8 Feb 2025 00:17:15 GMT, Kim Barrett wrote: >> That doesn't imply the current approach is the best approach to access the source; it only states that the access is *possible*. > > I don't see any better way. https://openjdk.org/jtreg/vmoptions.html doesn't list anything other than > the "test.src" property as a way to find the source hierarchy. If you know of an alternative, please suggest. I agree with @kimbarrett - there's no canonical way built into `jtreg` to obtain the `src` directory of a or the JDK under test. This tests reads more like an automated tool call that you'd normally pass in a start directory as an argument. No? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1948811021 From cnorrbin at openjdk.org Mon Feb 10 10:40:28 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Mon, 10 Feb 2025 10:40:28 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v6] In-Reply-To: References: Message-ID: <7AB_9UMR-wlrQICDtZ8KKpRO6fjMo5iOYO5uj1gPex4=.b3160019-a7cf-4b25-afb7-b029d0a4e2d7@github.com> > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' into rb-tree-intrusive-v2 - initialize node on insert + more tests - windows build - build fix - reduced diff - 0-sized value - intrusive red-black tree ------------- Changes: https://git.openjdk.org/jdk/pull/23416/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=05 Stats: 725 lines in 3 files changed: 539 ins; 113 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From cnorrbin at openjdk.org Mon Feb 10 10:44:35 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Mon, 10 Feb 2025 10:44:35 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v7] In-Reply-To: References: Message-ID: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: empty base optimization reference ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23416/files - new: https://git.openjdk.org/jdk/pull/23416/files/6d4023ed..174d169f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From shade at openjdk.org Mon Feb 10 10:50:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Feb 2025 10:50:10 GMT Subject: RFR: 8349639: jfr/event/gc/detailed/TestShenandoahEvacuationInformationEvent.java fails to compile after JDK-8348610 In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 08:14:14 GMT, Aleksey Shipilev wrote: > A simple test bug crept in through https://github.com/openjdk/jdk/commit/bad39b6d8892ba9b86bc81bf01108a1df617defb. > > Additional testing: > - [x] Affected test now passes Attn @satyenme @earthling-amzn ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23511#issuecomment-2647612086 From duke at openjdk.org Mon Feb 10 11:06:37 2025 From: duke at openjdk.org (duke) Date: Mon, 10 Feb 2025 11:06:37 GMT Subject: Withdrawn: 8342818: Implement CPU Time Profiling for JFR In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 16:47:21 GMT, Johannes Bechberger wrote: > This is the code for the [JEP draft: CPU Time based profiling for JFR]. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20752 From amitkumar at openjdk.org Mon Feb 10 11:41:09 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 10 Feb 2025 11:41:09 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic In-Reply-To: References: Message-ID: <51rttZ_2kkIMAO2BSwWL4z5qvUhdN0mx_zB4OsmQyf4=.a1b55544-ab9d-47d4-90ee-44e6f8e37297@github.com> On Mon, 10 Feb 2025 02:29:03 GMT, Amit Kumar wrote: > s390x implementation for Class.isInstance intrinsic. > > Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. > > Benchmark results will be updated soon. without patch: Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 1.430 ? 0.106 ns/op SecondarySupersLookup.testNegative01 avgt 15 1.390 ? 0.085 ns/op SecondarySupersLookup.testNegative02 avgt 15 1.392 ? 0.086 ns/op SecondarySupersLookup.testNegative03 avgt 15 1.354 ? 0.011 ns/op SecondarySupersLookup.testNegative04 avgt 15 1.353 ? 0.009 ns/op SecondarySupersLookup.testNegative05 avgt 15 1.430 ? 0.104 ns/op SecondarySupersLookup.testNegative06 avgt 15 1.352 ? 0.008 ns/op SecondarySupersLookup.testNegative07 avgt 15 1.353 ? 0.008 ns/op SecondarySupersLookup.testNegative08 avgt 15 1.352 ? 0.009 ns/op SecondarySupersLookup.testNegative09 avgt 15 1.355 ? 0.012 ns/op SecondarySupersLookup.testNegative10 avgt 15 1.391 ? 0.086 ns/op SecondarySupersLookup.testNegative16 avgt 15 1.352 ? 0.008 ns/op SecondarySupersLookup.testNegative20 avgt 15 1.391 ? 0.086 ns/op SecondarySupersLookup.testNegative30 avgt 15 1.391 ? 0.086 ns/op SecondarySupersLookup.testNegative32 avgt 15 1.430 ? 0.104 ns/op SecondarySupersLookup.testNegative40 avgt 15 1.353 ? 0.010 ns/op SecondarySupersLookup.testNegative50 avgt 15 1.390 ? 0.087 ns/op SecondarySupersLookup.testNegative55 avgt 15 25.403 ? 1.269 ns/op SecondarySupersLookup.testNegative56 avgt 15 26.406 ? 2.156 ns/op SecondarySupersLookup.testNegative57 avgt 15 26.495 ? 1.960 ns/op SecondarySupersLookup.testNegative58 avgt 15 28.065 ? 3.160 ns/op SecondarySupersLookup.testNegative59 avgt 15 28.189 ? 3.006 ns/op SecondarySupersLookup.testNegative60 avgt 15 29.089 ? 3.349 ns/op SecondarySupersLookup.testNegative61 avgt 15 27.718 ? 1.070 ns/op SecondarySupersLookup.testNegative62 avgt 15 28.047 ? 1.146 ns/op SecondarySupersLookup.testNegative63 avgt 15 28.695 ? 1.611 ns/op SecondarySupersLookup.testNegative64 avgt 15 29.326 ? 2.023 ns/op SecondarySupersLookup.testPositive01 avgt 15 1.719 ? 0.049 ns/op SecondarySupersLookup.testPositive02 avgt 15 1.744 ? 0.080 ns/op SecondarySupersLookup.testPositive03 avgt 15 1.743 ? 0.076 ns/op SecondarySupersLookup.testPositive04 avgt 15 1.764 ? 0.087 ns/op SecondarySupersLookup.testPositive05 avgt 15 1.763 ? 0.085 ns/op SecondarySupersLookup.testPositive06 avgt 15 1.741 ? 0.075 ns/op SecondarySupersLookup.testPositive07 avgt 15 1.719 ? 0.050 ns/op SecondarySupersLookup.testPositive08 avgt 15 1.719 ? 0.049 ns/op SecondarySupersLookup.testPositive09 avgt 15 1.720 ? 0.050 ns/op SecondarySupersLookup.testPositive10 avgt 15 1.765 ? 0.087 ns/op SecondarySupersLookup.testPositive16 avgt 15 1.744 ? 0.076 ns/op SecondarySupersLookup.testPositive20 avgt 15 1.721 ? 0.053 ns/op SecondarySupersLookup.testPositive30 avgt 15 1.719 ? 0.051 ns/op SecondarySupersLookup.testPositive32 avgt 15 1.721 ? 0.052 ns/op SecondarySupersLookup.testPositive40 avgt 15 12.798 ? 0.150 ns/op SecondarySupersLookup.testPositive50 avgt 15 1.744 ? 0.076 ns/op SecondarySupersLookup.testPositive60 avgt 15 24.580 ? 0.567 ns/op SecondarySupersLookup.testPositive63 avgt 15 23.523 ? 1.635 ns/op SecondarySupersLookup.testPositive64 avgt 15 33.512 ? 3.343 ns/op with patch: SecondarySupersLookup.testNegative00 avgt 15 1.354 ? 0.009 ns/op SecondarySupersLookup.testNegative01 avgt 15 1.399 ? 0.086 ns/op SecondarySupersLookup.testNegative02 avgt 15 1.365 ? 0.054 ns/op SecondarySupersLookup.testNegative03 avgt 15 1.351 ? 0.008 ns/op SecondarySupersLookup.testNegative04 avgt 15 1.353 ? 0.010 ns/op SecondarySupersLookup.testNegative05 avgt 15 1.353 ? 0.009 ns/op SecondarySupersLookup.testNegative06 avgt 15 1.470 ? 0.106 ns/op SecondarySupersLookup.testNegative07 avgt 15 1.365 ? 0.055 ns/op SecondarySupersLookup.testNegative08 avgt 15 1.352 ? 0.008 ns/op SecondarySupersLookup.testNegative09 avgt 15 1.431 ? 0.106 ns/op SecondarySupersLookup.testNegative10 avgt 15 1.355 ? 0.012 ns/op SecondarySupersLookup.testNegative16 avgt 15 1.430 ? 0.107 ns/op SecondarySupersLookup.testNegative20 avgt 15 1.352 ? 0.008 ns/op SecondarySupersLookup.testNegative30 avgt 15 1.354 ? 0.009 ns/op SecondarySupersLookup.testNegative32 avgt 15 1.391 ? 0.084 ns/op SecondarySupersLookup.testNegative40 avgt 15 1.392 ? 0.086 ns/op SecondarySupersLookup.testNegative50 avgt 15 1.354 ? 0.011 ns/op SecondarySupersLookup.testNegative55 avgt 15 25.587 ? 1.993 ns/op SecondarySupersLookup.testNegative56 avgt 15 26.048 ? 1.970 ns/op SecondarySupersLookup.testNegative57 avgt 15 27.874 ? 3.353 ns/op SecondarySupersLookup.testNegative58 avgt 15 26.392 ? 1.136 ns/op SecondarySupersLookup.testNegative59 avgt 15 26.593 ? 0.393 ns/op SecondarySupersLookup.testNegative60 avgt 15 27.567 ? 1.394 ns/op SecondarySupersLookup.testNegative61 avgt 15 28.813 ? 2.429 ns/op SecondarySupersLookup.testNegative62 avgt 15 28.523 ? 1.723 ns/op SecondarySupersLookup.testNegative63 avgt 15 29.832 ? 2.802 ns/op SecondarySupersLookup.testNegative64 avgt 15 30.048 ? 2.749 ns/op SecondarySupersLookup.testPositive01 avgt 15 1.788 ? 0.092 ns/op SecondarySupersLookup.testPositive02 avgt 15 1.742 ? 0.076 ns/op SecondarySupersLookup.testPositive03 avgt 15 1.741 ? 0.074 ns/op SecondarySupersLookup.testPositive04 avgt 15 1.764 ? 0.084 ns/op SecondarySupersLookup.testPositive05 avgt 15 1.741 ? 0.076 ns/op SecondarySupersLookup.testPositive06 avgt 15 1.720 ? 0.051 ns/op SecondarySupersLookup.testPositive07 avgt 15 1.722 ? 0.054 ns/op SecondarySupersLookup.testPositive08 avgt 15 1.721 ? 0.053 ns/op SecondarySupersLookup.testPositive09 avgt 15 1.764 ? 0.086 ns/op SecondarySupersLookup.testPositive10 avgt 15 1.719 ? 0.050 ns/op SecondarySupersLookup.testPositive16 avgt 15 1.741 ? 0.074 ns/op SecondarySupersLookup.testPositive20 avgt 15 1.721 ? 0.053 ns/op SecondarySupersLookup.testPositive30 avgt 15 1.759 ? 0.111 ns/op SecondarySupersLookup.testPositive32 avgt 15 1.741 ? 0.074 ns/op SecondarySupersLookup.testPositive40 avgt 15 12.722 ? 0.085 ns/op SecondarySupersLookup.testPositive50 avgt 15 1.735 ? 0.098 ns/op SecondarySupersLookup.testPositive60 avgt 15 25.368 ? 1.966 ns/op SecondarySupersLookup.testPositive63 avgt 15 23.529 ? 1.987 ns/op SecondarySupersLookup.testPositive64 avgt 15 30.335 ? 3.803 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/23535#issuecomment-2647734874 From mdoerr at openjdk.org Mon Feb 10 12:12:14 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 10 Feb 2025 12:12:14 GMT Subject: RFR: 8349639: jfr/event/gc/detailed/TestShenandoahEvacuationInformationEvent.java fails to compile after JDK-8348610 In-Reply-To: References: Message-ID: <0hleW4rUImIn6eM1-O-mjRYknjlIh9T5XOUx39vl93w=.d52ddb40-795d-4f13-8266-b9879460afc4@github.com> On Fri, 7 Feb 2025 08:14:14 GMT, Aleksey Shipilev wrote: > A simple test bug crept in through https://github.com/openjdk/jdk/commit/bad39b6d8892ba9b86bc81bf01108a1df617defb. > > Additional testing: > - [x] Affected test now passes LGTM. Should also be reviewed by a Shenandoah expert. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23511#pullrequestreview-2605569852 From bulasevich at openjdk.org Mon Feb 10 12:22:10 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 10 Feb 2025 12:22:10 GMT Subject: RFR: 8349652: Rewire nmethod oop load barriers In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 09:57:15 GMT, Stefan Karlsson wrote: > When loading oops from nmethods we current use the Access API to inject load barriers for the GCs that requires them. As part of the ZGC load barrier we need access to the nmethod to properly perform the load barrier. The current implementation of the Access API doesn't support passing down the nmethod through all its layers of code so ZGC asks the code cache what nmethod the various oops belongs to. There's currently an open PR for JDK-8343789 (#21276), which moves the oops out of the code cache, so the current way ZGC implementation will not work after that has been integrated. > > The proposal is to figure out a way to explicitly pass down the nmethod to the load barriers. > > We could extend the Access API to pass down the nmethod through all its various layers. The drawback of that is that it adds a lot of boiler plate code and requires new over loads and/or names. Given that this isn't performance critical code I propose that we take the much simpler route and call straight to the BarrierSetNMethod class. > > Given that MMethodAccess and IN_NMETHOD were only introduced to support nmethod oop loads for ZGC and are note used anymore I've also removed them from the code. > > Tested with reproducer for the ZGC issue in JDK-8343789, tier1-7 Linux with ZGC tasks, currently running tier1-3. src/hotspot/share/gc/shared/barrierSetNMethod.hpp line 29: > 27: > 28: #include "memory/allocation.hpp" > 29: #include "oops/oopsHierarchy.hpp" This is probably an unnecessary include ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23512#discussion_r1948956062 From bulasevich at openjdk.org Mon Feb 10 12:26:14 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 10 Feb 2025 12:26:14 GMT Subject: RFR: 8349652: Rewire nmethod oop load barriers In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 09:57:15 GMT, Stefan Karlsson wrote: > When loading oops from nmethods we current use the Access API to inject load barriers for the GCs that requires them. As part of the ZGC load barrier we need access to the nmethod to properly perform the load barrier. The current implementation of the Access API doesn't support passing down the nmethod through all its layers of code so ZGC asks the code cache what nmethod the various oops belongs to. There's currently an open PR for JDK-8343789 (#21276), which moves the oops out of the code cache, so the current way ZGC implementation will not work after that has been integrated. > > The proposal is to figure out a way to explicitly pass down the nmethod to the load barriers. > > We could extend the Access API to pass down the nmethod through all its various layers. The drawback of that is that it adds a lot of boiler plate code and requires new over loads and/or names. Given that this isn't performance critical code I propose that we take the much simpler route and call straight to the BarrierSetNMethod class. > > Given that MMethodAccess and IN_NMETHOD were only introduced to support nmethod oop loads for ZGC and are note used anymore I've also removed them from the code. > > Tested with reproducer for the ZGC issue in JDK-8343789, tier1-7 Linux with ZGC tasks, currently running tier1-3. Good. Many thanks to you. I am looking forward to this change being integrated. My PR #21276, which builds on top of this change, now passes jtreg with -XX:+UseZGC. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23512#issuecomment-2647834206 From coleenp at openjdk.org Mon Feb 10 12:47:31 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 10 Feb 2025 12:47:31 GMT Subject: RFR: 8346567: Make Class.getModifiers() non-native [v7] In-Reply-To: References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: On Fri, 7 Feb 2025 12:34:40 GMT, Coleen Phillimore wrote: >> The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. >> >> There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix jvmci test. Thank you for the reviews Yudi, Alan, Chen, Vladimir and Dean, and the help and comments with the various pieces of this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22652#issuecomment-2647880184 From coleenp at openjdk.org Mon Feb 10 12:47:32 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 10 Feb 2025 12:47:32 GMT Subject: Integrated: 8346567: Make Class.getModifiers() non-native In-Reply-To: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> References: <7X3DYiPMRGAIWCyCP64kbZvHuxjmmszGxfH1dfSu38k=.7fdb2512-1999-4c7e-835c-da96d57ca1be@github.com> Message-ID: <-VYQTxGucpCCQZccdw6wMnDavFDAt75MDHY8mGxEMiw=.042099b8-41dc-4b0d-8bdd-a874f004a0f6@github.com> On Mon, 9 Dec 2024 19:26:53 GMT, Coleen Phillimore wrote: > The Class.getModifiers() method is implemented as a native method in java.lang.Class to access a field that we've calculated when creating the mirror. The field is final after that point. The VM doesn't need it anymore, so there's no real need for the jdk code to call into the VM to get it. This moves the field to Java and removes the intrinsic code. I promoted the compute_modifiers() functions to return int since that's how java.lang.Class uses the value. It should really be an unsigned short though. > > There's a couple of JMH benchmarks added with this change. One does show that for array classes for non-bootstrap class loader, this results in one extra load which in a long loop of just that, is observable. I don't think this is real life code. The other benchmarks added show no regression. > > Tested with tier1-8. This pull request has now been integrated. Changeset: c9cadbd2 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/c9cadbd23fb13933b8968f283d27842cd35f8d6f Stats: 217 lines in 31 files changed: 71 ins; 127 del; 19 mod 8346567: Make Class.getModifiers() non-native Reviewed-by: alanb, vlivanov, yzheng, dlong ------------- PR: https://git.openjdk.org/jdk/pull/22652 From stefank at openjdk.org Mon Feb 10 12:58:12 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 10 Feb 2025 12:58:12 GMT Subject: RFR: 8349652: Rewire nmethod oop load barriers In-Reply-To: References: Message-ID: On Mon, 10 Feb 2025 12:23:12 GMT, Boris Ulasevich wrote: > Good. Many thanks to you. I am looking forward to this change being integrated. My PR #21276, which builds on top of this change, now passes jtreg with -XX:+UseZGC. Thanks and thanks for verifying that this fits with your changes! > src/hotspot/share/gc/shared/barrierSetNMethod.hpp line 29: > >> 27: >> 28: #include "memory/allocation.hpp" >> 29: #include "oops/oopsHierarchy.hpp" > > This is probably an unnecessary include It's needed because I use the type `oop`. I got concrete compilation errors when Shenandoah was compiled and adding this include fixes those issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23512#issuecomment-2647908012 PR Review Comment: https://git.openjdk.org/jdk/pull/23512#discussion_r1949006110 From sroy at openjdk.org Mon Feb 10 13:20:13 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Mon, 10 Feb 2025 13:20:13 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v19] In-Reply-To: <3h9GAveTJ2kDTw97K5tLV_Sg6i0_2aIE-dmxYF6ZoO0=.3894b29c-3616-4aca-9d9a-6c4e947e7658@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> <-Nrzcr1rY6Os3DyzZdAMltyDGWmdBYqPhneFzFIhkDM=.a294ffb8-bcda-4c04-804b-3313b179d6b6@github.com> <3h9GAveTJ2kDTw97K5tLV_Sg6i0_2aIE-dmxYF6ZoO0=.3894b29c-3616-4aca-9d9a-6c4e947e7658@github.com> Message-ID: On Sat, 8 Feb 2025 12:00:35 GMT, Martin Doerr wrote: >> vspltisb(vZero,0) is needed. >> __ vsldoi(vTmp8, vTmp5, vZero, 8); // mL : Extract the lower 64 bits of M >> __ vsldoi(vTmp9, vZero, vTmp5, 8); // mH : Extract the higher 64 bits of M >> We need to extract appropriate bits and for that vZero needs to be initialised to 0 always. > > The problem is that you're overwriting it below which should not be done: > > __ vxor(vZero, vTmp4, vTmp10); > __ vmr(vState, vZero); > > Why not `__ vxor(vState, vTmp4, vTmp10);`? We are storing the result in each operation into vState to re use in the next operation using __ vxor(vH, vH, vState); This is similar to https://github.com/openjdk/jdk/blob/c9cadbd23fb13933b8968f283d27842cd35f8d6f/src/java.base/share/classes/com/sun/crypto/provider/GHASH.java#L118 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1949034655 From mdoerr at openjdk.org Mon Feb 10 13:20:14 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 10 Feb 2025 13:20:14 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v19] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> <-Nrzcr1rY6Os3DyzZdAMltyDGWmdBYqPhneFzFIhkDM=.a294ffb8-bcda-4c04-804b-3313b179d6b6@github.com> <3h9GAveTJ2kDTw97K5tLV_Sg6i0_2aIE-dmxYF6ZoO0=.3894b29c-3616-4aca-9d9a-6c4e947e7658@github.com> Message-ID: On Mon, 10 Feb 2025 13:15:15 GMT, Suchismith Roy wrote: >> The problem is that you're overwriting it below which should not be done: >> >> __ vxor(vZero, vTmp4, vTmp10); >> __ vmr(vState, vZero); >> >> Why not `__ vxor(vState, vTmp4, vTmp10);`? > > We are storing the result in each operation into vState to re use in the next operation using > __ vxor(vH, vH, vState); > This is similar to https://github.com/openjdk/jdk/blob/c9cadbd23fb13933b8968f283d27842cd35f8d6f/src/java.base/share/classes/com/sun/crypto/provider/GHASH.java#L118 That doesn't answer "Why not __ vxor(vState, vTmp4, vTmp10);?" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1949038200 From coleenp at openjdk.org Mon Feb 10 13:23:49 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 10 Feb 2025 13:23:49 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v7] In-Reply-To: References: Message-ID: > This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. > Tested with tier1-4. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge branch 'master' into protection-domain - Move test for protectionDomain filtering. - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> - Remove @Stable annotation for final field. - Fix test that knows which fields are hidden from reflection in jvmci. - Hide Class.protectionDomain for reflection and add a test case. - Merge branch 'master' into protection-domain - Fix two tests. - Fix the test. - ... and 1 more: https://git.openjdk.org/jdk/compare/c9cadbd2...2208302c ------------- Changes: https://git.openjdk.org/jdk/pull/23396/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23396&range=06 Stats: 65 lines in 13 files changed: 15 ins; 34 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/23396.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23396/head:pull/23396 PR: https://git.openjdk.org/jdk/pull/23396 From sroy at openjdk.org Mon Feb 10 13:29:13 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Mon, 10 Feb 2025 13:29:13 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v19] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> <-Nrzcr1rY6Os3DyzZdAMltyDGWmdBYqPhneFzFIhkDM=.a294ffb8-bcda-4c04-804b-3313b179d6b6@github.com> <3h9GAveTJ2kDTw97K5tLV_Sg6i0_2aIE-dmxYF6ZoO0=.3894b29c-3616-4aca-9d9a-6c4e947e7658@github.com> Message-ID: On Mon, 10 Feb 2025 13:17:40 GMT, Martin Doerr wrote: >> We are storing the result in each operation into vState to re use in the next operation using >> __ vxor(vH, vH, vState); >> This is similar to https://github.com/openjdk/jdk/blob/c9cadbd23fb13933b8968f283d27842cd35f8d6f/src/java.base/share/classes/com/sun/crypto/provider/GHASH.java#L118 > > That doesn't answer "Why not __ vxor(vState, vTmp4, vTmp10);?" Ok. Yeah that should work. I will try it out to see if any tests fail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1949052318 From coleenp at openjdk.org Mon Feb 10 13:31:12 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 10 Feb 2025 13:31:12 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v4] In-Reply-To: References: <6TuFE5mx8jXm2donAE_cM3I5UXCaB1eKrpCyp7qk0wM=.1c585567-cf27-4d3e-bca9-4aa7a556942c@github.com> Message-ID: <-UEqYFgj9UYs8PyS9GLD2b1kU11bvMKi5vFmEbXcb-I=.14ab7e47-61dc-4c21-b2e6-8829e938b8c4@github.com> On Mon, 10 Feb 2025 10:09:59 GMT, David Holmes wrote: >> Because the field is final, it has to be initialized in the constructor in Java code. My initial patch for modifiers chose to initialize to zero but that's not quite correct. The constructor cannot be called nor can it be made accessible with setAccessible(). So the constructor for java.lang.Class is essentially the Hotspot code JavaClasses::create_mirror(). This is where the PD is assigned. > > Okay so this pattern of assigning the final fields in the constructor that is never called is just for appeasing the javac compiler. Yes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1948979011 From coleenp at openjdk.org Mon Feb 10 13:31:13 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 10 Feb 2025 13:31:13 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v7] In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 16:56:53 GMT, Coleen Phillimore wrote: >> Aside from JVMTI (CFLH for example), is there anything left in the VM that needs this? The last param to JVM_DefineClassWithSource has the location from the code source if available. > > The VM doesn't need this but it carries it around because it's a parameter to JVM_DefineClass and DefineClassWithSource (second to last parameter). CFLH and CDS from what I can tell have it for the same purpose - ultimately to assign it into the mirror. > There are some remaining code in the compilers (ci). Not sure if they are needed without studying it more. The remaining code in ci isn't necessary. I filed and fixed another issue for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23396#discussion_r1945033858 From amitkumar at openjdk.org Mon Feb 10 13:36:10 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 10 Feb 2025 13:36:10 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic In-Reply-To: References: Message-ID: On Mon, 10 Feb 2025 02:29:03 GMT, Amit Kumar wrote: > s390x implementation for Class.isInstance intrinsic. > > Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. > > Benchmark results will be updated soon. command : `make test TEST="micro:vm.lang.SecondarySupersLookup" MICRO=" JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseSecondarySupersCache -XX:TieredStopAtLevel=1"` without patch: Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 6.554 ? 0.023 ns/op SecondarySupersLookup.testNegative01 avgt 15 6.690 ? 0.428 ns/op SecondarySupersLookup.testNegative02 avgt 15 6.561 ? 0.019 ns/op SecondarySupersLookup.testNegative03 avgt 15 6.545 ? 0.004 ns/op SecondarySupersLookup.testNegative04 avgt 15 6.549 ? 0.011 ns/op SecondarySupersLookup.testNegative05 avgt 15 6.554 ? 0.027 ns/op SecondarySupersLookup.testNegative06 avgt 15 6.551 ? 0.019 ns/op SecondarySupersLookup.testNegative07 avgt 15 6.548 ? 0.009 ns/op SecondarySupersLookup.testNegative08 avgt 15 6.549 ? 0.015 ns/op SecondarySupersLookup.testNegative09 avgt 15 6.550 ? 0.014 ns/op SecondarySupersLookup.testNegative10 avgt 15 6.546 ? 0.004 ns/op SecondarySupersLookup.testNegative16 avgt 15 6.552 ? 0.014 ns/op SecondarySupersLookup.testNegative20 avgt 15 6.551 ? 0.017 ns/op SecondarySupersLookup.testNegative30 avgt 15 6.546 ? 0.006 ns/op SecondarySupersLookup.testNegative32 avgt 15 6.545 ? 0.002 ns/op SecondarySupersLookup.testNegative40 avgt 15 6.553 ? 0.016 ns/op SecondarySupersLookup.testNegative50 avgt 15 6.549 ? 0.012 ns/op SecondarySupersLookup.testNegative55 avgt 15 16.530 ? 0.043 ns/op SecondarySupersLookup.testNegative56 avgt 15 16.520 ? 0.030 ns/op SecondarySupersLookup.testNegative57 avgt 15 16.522 ? 0.036 ns/op SecondarySupersLookup.testNegative58 avgt 15 16.517 ? 0.028 ns/op SecondarySupersLookup.testNegative59 avgt 15 19.802 ? 0.298 ns/op SecondarySupersLookup.testNegative60 avgt 15 21.237 ? 0.044 ns/op SecondarySupersLookup.testNegative61 avgt 15 21.241 ? 0.050 ns/op SecondarySupersLookup.testNegative62 avgt 15 21.243 ? 0.042 ns/op SecondarySupersLookup.testNegative63 avgt 15 25.421 ? 0.033 ns/op SecondarySupersLookup.testNegative64 avgt 15 25.064 ? 0.089 ns/op SecondarySupersLookup.testPositive01 avgt 15 9.818 ? 0.026 ns/op SecondarySupersLookup.testPositive02 avgt 15 9.819 ? 0.017 ns/op SecondarySupersLookup.testPositive03 avgt 15 9.826 ? 0.025 ns/op SecondarySupersLookup.testPositive04 avgt 15 9.817 ? 0.019 ns/op SecondarySupersLookup.testPositive05 avgt 15 9.815 ? 0.022 ns/op SecondarySupersLookup.testPositive06 avgt 15 9.821 ? 0.018 ns/op SecondarySupersLookup.testPositive07 avgt 15 9.824 ? 0.035 ns/op SecondarySupersLookup.testPositive08 avgt 15 9.837 ? 0.041 ns/op SecondarySupersLookup.testPositive09 avgt 15 9.820 ? 0.030 ns/op SecondarySupersLookup.testPositive10 avgt 15 9.817 ? 0.008 ns/op SecondarySupersLookup.testPositive16 avgt 15 9.819 ? 0.016 ns/op SecondarySupersLookup.testPositive20 avgt 15 9.818 ? 0.012 ns/op SecondarySupersLookup.testPositive30 avgt 15 9.820 ? 0.013 ns/op SecondarySupersLookup.testPositive32 avgt 15 9.820 ? 0.024 ns/op SecondarySupersLookup.testPositive40 avgt 15 12.722 ? 0.029 ns/op SecondarySupersLookup.testPositive50 avgt 15 9.820 ? 0.020 ns/op SecondarySupersLookup.testPositive60 avgt 15 12.717 ? 0.015 ns/op SecondarySupersLookup.testPositive63 avgt 15 22.316 ? 0.024 ns/op SecondarySupersLookup.testPositive64 avgt 15 24.904 ? 0.057 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' with the patch: Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 4.780 ? 0.177 ns/op SecondarySupersLookup.testNegative01 avgt 15 4.719 ? 0.010 ns/op SecondarySupersLookup.testNegative02 avgt 15 4.766 ? 0.179 ns/op SecondarySupersLookup.testNegative03 avgt 15 4.723 ? 0.013 ns/op SecondarySupersLookup.testNegative04 avgt 15 4.761 ? 0.169 ns/op SecondarySupersLookup.testNegative05 avgt 15 4.760 ? 0.171 ns/op SecondarySupersLookup.testNegative06 avgt 15 4.719 ? 0.008 ns/op SecondarySupersLookup.testNegative07 avgt 15 4.719 ? 0.009 ns/op SecondarySupersLookup.testNegative08 avgt 15 4.718 ? 0.007 ns/op SecondarySupersLookup.testNegative09 avgt 15 4.761 ? 0.168 ns/op SecondarySupersLookup.testNegative10 avgt 15 4.762 ? 0.091 ns/op SecondarySupersLookup.testNegative16 avgt 15 4.719 ? 0.009 ns/op SecondarySupersLookup.testNegative20 avgt 15 4.721 ? 0.013 ns/op SecondarySupersLookup.testNegative30 avgt 15 4.762 ? 0.184 ns/op SecondarySupersLookup.testNegative32 avgt 15 4.884 ? 0.301 ns/op SecondarySupersLookup.testNegative40 avgt 15 4.721 ? 0.013 ns/op SecondarySupersLookup.testNegative50 avgt 15 4.719 ? 0.009 ns/op SecondarySupersLookup.testNegative55 avgt 15 29.569 ? 3.057 ns/op SecondarySupersLookup.testNegative56 avgt 15 29.835 ? 2.460 ns/op SecondarySupersLookup.testNegative57 avgt 15 33.406 ? 3.634 ns/op SecondarySupersLookup.testNegative58 avgt 15 31.665 ? 3.438 ns/op SecondarySupersLookup.testNegative59 avgt 15 35.713 ? 3.282 ns/op SecondarySupersLookup.testNegative60 avgt 15 31.220 ? 2.361 ns/op SecondarySupersLookup.testNegative61 avgt 15 34.202 ? 3.560 ns/op SecondarySupersLookup.testNegative62 avgt 15 32.143 ? 2.823 ns/op SecondarySupersLookup.testNegative63 avgt 15 32.445 ? 2.387 ns/op SecondarySupersLookup.testNegative64 avgt 15 35.546 ? 3.793 ns/op SecondarySupersLookup.testPositive01 avgt 15 5.211 ? 0.011 ns/op SecondarySupersLookup.testPositive02 avgt 15 5.225 ? 0.073 ns/op SecondarySupersLookup.testPositive03 avgt 15 5.211 ? 0.009 ns/op SecondarySupersLookup.testPositive04 avgt 15 5.211 ? 0.009 ns/op SecondarySupersLookup.testPositive05 avgt 15 5.228 ? 0.072 ns/op SecondarySupersLookup.testPositive06 avgt 15 5.852 ? 1.266 ns/op SecondarySupersLookup.testPositive07 avgt 15 5.213 ? 0.012 ns/op SecondarySupersLookup.testPositive08 avgt 15 5.234 ? 0.101 ns/op SecondarySupersLookup.testPositive09 avgt 15 5.227 ? 0.067 ns/op SecondarySupersLookup.testPositive10 avgt 15 5.214 ? 0.015 ns/op SecondarySupersLookup.testPositive16 avgt 15 5.213 ? 0.018 ns/op SecondarySupersLookup.testPositive20 avgt 15 5.209 ? 0.009 ns/op SecondarySupersLookup.testPositive30 avgt 15 5.208 ? 0.004 ns/op SecondarySupersLookup.testPositive32 avgt 15 5.266 ? 0.121 ns/op SecondarySupersLookup.testPositive40 avgt 15 16.094 ? 0.621 ns/op SecondarySupersLookup.testPositive50 avgt 15 5.215 ? 0.016 ns/op SecondarySupersLookup.testPositive60 avgt 15 29.342 ? 3.571 ns/op SecondarySupersLookup.testPositive63 avgt 15 27.752 ? 3.543 ns/op SecondarySupersLookup.testPositive64 avgt 15 36.571 ? 4.141 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' ------------- PR Comment: https://git.openjdk.org/jdk/pull/23535#issuecomment-2648003297 From jsjolen at openjdk.org Mon Feb 10 14:17:39 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 10 Feb 2025 14:17:39 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v21] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: <6uq32Tm6oCiyWlXYvmquDd3wcCdruX1TGH6XWMrvgVM=.5add9458-0746-42c8-8b2b-4a0aaf8f5ee6@github.com> On Thu, 6 Feb 2025 15:51:41 GMT, Afshin Zafari wrote: >> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. >> - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. >> - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` >> - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. >> - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. >> - Adding a runtime flag for selecting the old or new version can be added later. >> - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > fixed merge problems Hi! I've looked through some more of the PR and I've found multiple renamings from `type` to `tag`. It's great that you've done this, but can these changes be moved out into a separate PR for mainline? Keeping the diff minimal allows for much easier reviewing, and will let your hard work renaming type/flag to tag to be integrated sooner. I commented all the cases I found with "Move out into mainline PR". For `vmtCommon.hpp`: If I remember correctly this refactoring was made because we had 2 separate implementations, but now we only have one. Restoring the code such that the content of `vmtCommon.hpp` is put into `virtualMemoryTracker.hpp` will further make the diff smaller and make it easier to find the actual changes to the code. We're currently at +1528/-1846, that's a fairly large PR size. If we can make this smaller by separating out parts into multiple PRs and avoiding unnecessary refactoring of directory structure, then this can become quite a lot smaller and easier to review. I hope you'll consider making these changes, as I think it makes it easier on the reviewers. Thanks and all the best, Johan src/hotspot/share/nmt/mallocTracker.hpp line 163: > 161: } > 162: > 163: inline const MallocMemory* by_tag(MemTag mem_tag) const { Move out into mainline PR src/hotspot/share/nmt/mallocTracker.hpp line 241: > 239: > 240: static inline void record_arena_size_change(ssize_t size, MemTag mem_tag) { > 241: as_snapshot()->by_tag(mem_tag)->record_arena_size_change(size); Move out into mainline PR src/hotspot/share/nmt/mallocTracker.inline.hpp line 55: > 53: l = MallocLimitHandler::category_limit(mem_tag); > 54: if (l->sz > 0) { > 55: const MallocMemory* mm = as_snapshot()->by_tag(mem_tag); Move out into mainline PR src/hotspot/share/nmt/memBaseline.cpp line 64: > 62: > 63: // Sort into allocation site addresses and memory tag order for baseline comparison > 64: int compare_malloc_site_and_tag(const MallocSite& s1, const MallocSite& s2) { Move out into mainline PR src/hotspot/share/nmt/memBaseline.cpp line 235: > 233: break; > 234: case by_site_and_tag: > 235: malloc_sites_to_allocation_site_and_tag_order(); Move out into mainline PR src/hotspot/share/nmt/memBaseline.cpp line 275: > 273: > 274: void MemBaseline::malloc_sites_to_allocation_site_order() { > 275: if (_malloc_sites_order != by_site && _malloc_sites_order != by_site_and_tag) { Move out into mainline PR src/hotspot/share/nmt/memBaseline.cpp line 292: > 290: _malloc_sites.set_head(tmp.head()); > 291: tmp.set_head(nullptr); > 292: _malloc_sites_order = by_site_and_tag; Move out into mainline PR src/hotspot/share/nmt/memBaseline.hpp line 56: > 54: by_size, // by memory size > 55: by_site, // by call site where the memory is allocated from > 56: by_site_and_tag // by call site and memory tag Move out into mainline PR (and indentation of comment seems wrong) src/hotspot/share/nmt/memBaseline.hpp line 154: > 152: VirtualMemory* virtual_memory(MemTag mem_tag) { > 153: assert(baseline_type() != Not_baselined, "Not yet baselined"); > 154: return _virtual_memory_snapshot.by_tag(mem_tag); Move out into mainline PR src/hotspot/share/nmt/memBaseline.hpp line 207: > 205: void malloc_sites_to_allocation_site_order(); > 206: // Sort allocation sites in call site address and memory tag order > 207: void malloc_sites_to_allocation_site_and_tag_order(); Move out into mainline PR src/hotspot/share/nmt/memReporter.cpp line 192: > 190: } > 191: > 192: void MemSummaryReporter::report_summary_of_tag(MemTag mem_tag, Move out into mainline PR src/hotspot/share/nmt/memReporter.cpp line 201: > 199: if (mem_tag == mtThread) { > 200: const VirtualMemory* thread_stack_usage = > 201: (const VirtualMemory*)_vm_snapshot->by_tag(mtThreadStack); Move out into mainline PR src/hotspot/share/nmt/memReporter.cpp line 243: > 241: } else if (mem_tag == mtThread) { > 242: const VirtualMemory* thread_stack_usage = > 243: _vm_snapshot->by_tag(mtThreadStack); Move out into mainline PR src/hotspot/share/nmt/memReporter.cpp line 538: > 536: // thread stack is reported as part of thread category > 537: if (mem_tag == mtThreadStack) continue; > 538: diff_summary_of_tag(mem_tag, Move out into mainline PR src/hotspot/share/nmt/memReporter.cpp line 608: > 606: > 607: > 608: void MemSummaryDiffReporter::diff_summary_of_tag(MemTag mem_tag, Move out into mainline PR src/hotspot/share/nmt/memReporter.cpp line 810: > 808: void MemDetailDiffReporter::diff_malloc_sites() const { > 809: MallocSiteIterator early_itr = _early_baseline.malloc_sites(MemBaseline::by_site_and_tag); > 810: MallocSiteIterator current_itr = _current_baseline.malloc_sites(MemBaseline::by_site_and_tag); Move out into mainline PR src/hotspot/share/nmt/memoryFileTracker.cpp line 47: > 45: VMATree::SummaryDiff diff = file->_tree.commit_mapping(offset, size, regiondata); > 46: for (int i = 0; i < mt_number_of_tags; i++) { > 47: VirtualMemory* summary = file->_summary.by_tag(NMTUtil::index_to_tag(i)); Move out into mainline PR src/hotspot/share/nmt/memoryFileTracker.cpp line 56: > 54: VMATree::SummaryDiff diff = file->_tree.release_mapping(offset, size); > 55: for (int i = 0; i < mt_number_of_tags; i++) { > 56: VirtualMemory* summary = file->_summary.by_tag(NMTUtil::index_to_tag(i)); Move out into mainline PR src/hotspot/share/nmt/memoryFileTracker.cpp line 187: > 185: snap->commit_memory(current->committed()); > 186: } > 187: } Revert this change src/hotspot/share/nmt/memoryFileTracker.hpp line 81: > 79: const MemoryFile* file = _files.at(d); > 80: for (int i = 0; i < mt_number_of_tags; i++) { > 81: f(NMTUtil::index_to_tag(i), file->_summary.by_tag(NMTUtil::index_to_tag(i))); Move out into mainline PR src/hotspot/share/nmt/nmtCommon.cpp line 33: > 31: > 32: #define MEMORY_TAG_DECLARE_NAME(tag, human_readable) \ > 33: { #tag, human_readable }, Move out into mainline PR src/hotspot/share/nmt/nmtCommon.hpp line 91: > 89: // Map memory tag to index > 90: static inline int tag_to_index(MemTag mem_tag) { > 91: assert(tag_is_valid(mem_tag), "Invalid tag (%u)", (unsigned)mem_tag); Move out into mainline PR ------------- PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2605844859 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949121262 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949121527 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949120937 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949120631 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949120394 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949120180 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949119941 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949118303 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949118619 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949118841 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949116400 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949115914 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949115644 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949110432 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949110306 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949110184 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949109029 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949108903 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949107753 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949105112 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949104485 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1949104091 From sroy at openjdk.org Mon Feb 10 14:30:29 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Mon, 10 Feb 2025 14:30:29 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v20] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Sat, 8 Feb 2025 12:22:23 GMT, Martin Doerr wrote: >> Suchismith Roy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 42 commits: >> >> - Merge branch 'openjdk:master' into ghash_processblocks >> - adapt Condition registers >> - Merge branch 'openjdk:master' into ghash_processblocks >> - restore chnges >> - restore chnges >> - permute vHigh,vLow >> - indentation >> - comments >> - vsx logic change >> - spaces >> - ... and 32 more: https://git.openjdk.org/jdk/compare/86cec4ea...d22fcf25 > > src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 661: > >> 659: __ andi(temp1, data, 15); >> 660: __ cmpwi(CR0, temp1, 0); >> 661: __ beq(CR0, L_aligned); // Check if address is aligned (mask lower 4 bits) > > The alignment check should not be in the loop. Better check before and use 2 loops. > It would be interesting to know how often the data is aligned. You mean 1 loop for aligned address and one for unaligned ? Is there a way to write the common code of both in a function ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1949163007 From nbenalla at openjdk.org Mon Feb 10 15:24:13 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Mon, 10 Feb 2025 15:24:13 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v6] In-Reply-To: References: Message-ID: <8gan8nfbwoDaaOqqnqpMwcG-XkvvVJFTPwKbTlY5VZ8=.26eeffbb-0702-44ef-bc3b-75abe58d476d@github.com> > Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. > It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. > > Before adding line 86 and excluding `os_windows.cpp`, the test failed with: > > > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: > HMODULE hModule = NULL; > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: > GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); > java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. > at TestNoNULL.main(TestNoNULL.java:73) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > at java.base/java.lang.Thread.run(Thread.java:1447) Nizar Benalla has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - pass an empty set the method rather than a boolean better exception handling/message when encountering a binary file - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot - filter out `.zip` files - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot - trivial change, if .java files are filtered out then so should .class files - revert to the original regex and remove the exclusion of os_windows.cpp - update based on feedback - Add a test to prevent NULL backsliding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23466/files - new: https://git.openjdk.org/jdk/pull/23466/files/99577488..2112c863 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23466&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23466&range=04-05 Stats: 1899 lines in 99 files changed: 872 ins; 628 del; 399 mod Patch: https://git.openjdk.org/jdk/pull/23466.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23466/head:pull/23466 PR: https://git.openjdk.org/jdk/pull/23466 From nbenalla at openjdk.org Mon Feb 10 15:24:15 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Mon, 10 Feb 2025 15:24:15 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v5] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 17:10:59 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - filter out `.zip` files > - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot > - trivial change, if .java files are filtered out then so should .class files > - revert to the original regex and remove the exclusion of os_windows.cpp > - update based on feedback > - Add a test to prevent NULL backsliding Andrey's suggestion seemed like a positive change, I've updated the patch one more time. Also added some exception handling to avoid potential false positives. Requesting a re-review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23466#issuecomment-2648357061 From nbenalla at openjdk.org Mon Feb 10 15:27:12 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Mon, 10 Feb 2025 15:27:12 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v2] In-Reply-To: <46sqVAK7loCob1A9LcFs63SoiqaBwIwq1v9D_uqYtdY=.ac6528d5-9628-40f7-9a6c-471abee78757@github.com> References: <5SUTxzDb_jOFp4iB1-utmXIu-osA0-r5LaYwixoL_qk=.ee94c3d6-7ba2-4012-ab1b-3d6a0113d1ed@github.com> <46sqVAK7loCob1A9LcFs63SoiqaBwIwq1v9D_uqYtdY=.ac6528d5-9628-40f7-9a6c-471abee78757@github.com> Message-ID: On Mon, 10 Feb 2025 10:37:26 GMT, Christian Stein wrote: >> I don't see any better way. https://openjdk.org/jtreg/vmoptions.html doesn't list anything other than >> the "test.src" property as a way to find the source hierarchy. If you know of an alternative, please suggest. > > I agree with @kimbarrett - there's no canonical way built into `jtreg` to obtain the `src` directory of a or the JDK under test. > > This tests reads more like an automated tool call that you'd normally pass in a start directory as an argument. No? Thanks for the comment Christian. You're right that the test is a tool that needs a directory to check, I tried to avoid passing arguments to the test to avoid the need for `othervm` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1949325233 From mdoerr at openjdk.org Mon Feb 10 15:42:15 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 10 Feb 2025 15:42:15 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v20] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Mon, 10 Feb 2025 14:27:54 GMT, Suchismith Roy wrote: >> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 661: >> >>> 659: __ andi(temp1, data, 15); >>> 660: __ cmpwi(CR0, temp1, 0); >>> 661: __ beq(CR0, L_aligned); // Check if address is aligned (mask lower 4 bits) >> >> The alignment check should not be in the loop. Better check before and use 2 loops. >> It would be interesting to know how often the data is aligned. > > You mean 1 loop for aligned address and one for unaligned ? Is there a way to write the common code of both in a function ? Yes, you can move the common code into a static function and pass the masm, Registers, etc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1949356411 From shade at openjdk.org Mon Feb 10 15:54:23 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Feb 2025 15:54:23 GMT Subject: RFR: 8349639: jfr/event/gc/detailed/TestShenandoahEvacuationInformationEvent.java fails to compile after JDK-8348610 In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 08:14:14 GMT, Aleksey Shipilev wrote: > A simple test bug crept in through https://github.com/openjdk/jdk/commit/bad39b6d8892ba9b86bc81bf01108a1df617defb. > > Additional testing: > - [x] Affected test now passes Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23511#issuecomment-2648465848 From wkemper at openjdk.org Mon Feb 10 15:54:23 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Feb 2025 15:54:23 GMT Subject: RFR: 8349639: jfr/event/gc/detailed/TestShenandoahEvacuationInformationEvent.java fails to compile after JDK-8348610 In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 08:14:14 GMT, Aleksey Shipilev wrote: > A simple test bug crept in through https://github.com/openjdk/jdk/commit/bad39b6d8892ba9b86bc81bf01108a1df617defb. > > Additional testing: > - [x] Affected test now passes Doh! I'll add this test to Shenandoah's tier 3 test suite. ------------- Marked as reviewed by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/23511#pullrequestreview-2606317007 From shade at openjdk.org Mon Feb 10 15:54:24 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Feb 2025 15:54:24 GMT Subject: Integrated: 8349639: jfr/event/gc/detailed/TestShenandoahEvacuationInformationEvent.java fails to compile after JDK-8348610 In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 08:14:14 GMT, Aleksey Shipilev wrote: > A simple test bug crept in through https://github.com/openjdk/jdk/commit/bad39b6d8892ba9b86bc81bf01108a1df617defb. > > Additional testing: > - [x] Affected test now passes This pull request has now been integrated. Changeset: ab66c82c Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/ab66c82ce9fdb5ee3fd7690f42b8ad4d78bf5e40 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8349639: jfr/event/gc/detailed/TestShenandoahEvacuationInformationEvent.java fails to compile after JDK-8348610 Reviewed-by: mdoerr, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/23511 From mdoerr at openjdk.org Mon Feb 10 15:57:22 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 10 Feb 2025 15:57:22 GMT Subject: RFR: 8349639: jfr/event/gc/detailed/TestShenandoahEvacuationInformationEvent.java fails to compile after JDK-8348610 In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 08:14:14 GMT, Aleksey Shipilev wrote: > A simple test bug crept in through https://github.com/openjdk/jdk/commit/bad39b6d8892ba9b86bc81bf01108a1df617defb. > > Additional testing: > - [x] Affected test now passes Thanks for fixing it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23511#issuecomment-2648477406 From gziemski at openjdk.org Mon Feb 10 16:03:38 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 10 Feb 2025 16:03:38 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v23] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: - cleanup/fix build - define MAXTHREADNAMESIZE on Linux ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/fb3f757f..8dc369fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=21-22 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From liach at openjdk.org Mon Feb 10 16:21:15 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 10 Feb 2025 16:21:15 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v2] In-Reply-To: References: <5SUTxzDb_jOFp4iB1-utmXIu-osA0-r5LaYwixoL_qk=.ee94c3d6-7ba2-4012-ab1b-3d6a0113d1ed@github.com> <46sqVAK7loCob1A9LcFs63SoiqaBwIwq1v9D_uqYtdY=.ac6528d5-9628-40f7-9a6c-471abee78757@github.com> Message-ID: On Mon, 10 Feb 2025 15:24:54 GMT, Nizar Benalla wrote: >> I agree with @kimbarrett - there's no canonical way built into `jtreg` to obtain the `src` directory of a or the JDK under test. >> >> This tests reads more like an automated tool call that you'd normally pass in a start directory as an argument. No? > > Thanks for the comment Christian. > You're right that the test is a tool that needs a directory to check, I tried to avoid passing arguments to the test to avoid the need for `othervm` Thanks for your comments. Given that this indeed is the best approach for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1949446937 From iklam at openjdk.org Mon Feb 10 17:33:12 2025 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 10 Feb 2025 17:33:12 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v6] In-Reply-To: <8gan8nfbwoDaaOqqnqpMwcG-XkvvVJFTPwKbTlY5VZ8=.26eeffbb-0702-44ef-bc3b-75abe58d476d@github.com> References: <8gan8nfbwoDaaOqqnqpMwcG-XkvvVJFTPwKbTlY5VZ8=.26eeffbb-0702-44ef-bc3b-75abe58d476d@github.com> Message-ID: On Mon, 10 Feb 2025 15:24:13 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - pass an empty set the method rather than a boolean > > better exception handling/message when encountering a binary file > - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot > - filter out `.zip` files > - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot > - trivial change, if .java files are filtered out then so should .class files > - revert to the original regex and remove the exclusion of os_windows.cpp > - update based on feedback > - Add a test to prevent NULL backsliding Can this be accomplished without introducing a new test? E.g., add this to globalDefinitions.hpp: #ifdef NULL #undef NULL #endif #define NULL (do not use const char* x = NULL; // gcc gives error ------------- PR Comment: https://git.openjdk.org/jdk/pull/23466#issuecomment-2648758853 From gziemski at openjdk.org Mon Feb 10 17:34:03 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 10 Feb 2025 17:34:03 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v24] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: xepand summary ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/8dc369fd..d7d79f2d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=22-23 Stats: 33 lines in 1 file changed: 28 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From wkemper at openjdk.org Mon Feb 10 17:52:20 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Feb 2025 17:52:20 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v3] In-Reply-To: <6_AoWQhldJttOIEOL1T7HSapPzE4Qn2j4WN7E-bI3rM=.2685d3d8-e47c-42a6-845b-b68f50cc568e@github.com> References: <6_AoWQhldJttOIEOL1T7HSapPzE4Qn2j4WN7E-bI3rM=.2685d3d8-e47c-42a6-845b-b68f50cc568e@github.com> Message-ID: On Thu, 23 Jan 2025 05:45:43 GMT, Cesar Soares Lucas wrote: >> In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. >> >> The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. >> >> The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. >> >> Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. >> >> The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. >> >> Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. >> >> Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge master > - Addressing PR comments: some refactorings, ppc fix, off-by-one fix. > - Relocation of Card Tables Marked as reviewed by wkemper (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23170#pullrequestreview-2606716056 From jsjolen at openjdk.org Mon Feb 10 18:22:21 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 10 Feb 2025 18:22:21 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v7] In-Reply-To: References: Message-ID: On Mon, 10 Feb 2025 10:44:35 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. >> >> Two key changes enable this feature: >> 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. >> 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. >> >> >> Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. >> >> An example of how you could use the intrusive tree is found below: >> >> ```c++ >> struct MyIntrusiveStructure { >> Node node; // The tree node is part of an external structure >> int data; >> >> MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} >> Node* get_node() { return &node; } >> static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } >> }; >> >> Tree my_intrusive_tree; >> >> Cursor insert_cursor = my_intrusive_tree.cursor_find(0); >> Node insert_node = Node(0); >> >> // Custom allocation here is just malloc >> MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); >> new (place) MyIntrusiveStructure(0, insert_node); >> >> my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); >> >> Cursor find_cursor = my_intrusive_tree.cursor_find(0); >> int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; >> >> >> >> Please let me know any feedback or concerns! > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > empty base optimization reference src/hotspot/share/utilities/rbTree.hpp line 150: > 148: // If a cursor is valid (valid() == true) it points somewhere in the tree. > 149: // If the cursor points to an existing node (found() == true), node() can be used to access that node, > 150: // Otherwise nullptr is returned, regardless if the node is valid or not. Style: "nullptr" should be "null" here. src/hotspot/share/utilities/rbTree.hpp line 166: > 164: bool found() const { return *_insert_location != nullptr; } > 165: RBNode* node() { return _insert_location == nullptr ? nullptr : *_insert_location; } > 166: RBNode* node() const { return _insert_location == nullptr ? nullptr : *_insert_location; } Is there any case where I don't need to check the validity of the cursor? That is, do I ever want to use `node()` without first calling `valid()` or afterwards checking whether the returned value was null? If the answer to that is: "No, there is no such case", then we shouldn't return null on `!valid()` node. We should instead add `assert(valid(), "must be");". If the answer is yes, then could you please tell me what that situation is :P? src/hotspot/share/utilities/rbTree.hpp line 227: > 225: // Gets the cursor to the given node. > 226: Cursor get_cursor(const RBNode* node); > 227: const Cursor get_cursor(const RBNode* node) const; How about `cursor_of`, or just `cursor`? src/hotspot/share/utilities/rbTree.hpp line 229: > 227: const Cursor get_cursor(const RBNode* node) const; > 228: > 229: // Moves to the next valid node. "Valid" is a strange choice of word here. I assume you mean "existing"? As in, not a null child. src/hotspot/share/utilities/rbTree.hpp line 241: > 239: // Finds the cursor to the node associated with the given key. > 240: Cursor cursor_find(const K& key); > 241: const Cursor cursor_find(const K& key) const; This is `get_cursor` but with a key value instead of RBNode pointer. I think it's good if this has the same name as `get_cursor`, so use overloading. src/hotspot/share/utilities/rbTree.hpp line 244: > 242: > 243: // Inserts the given node at the cursor location > 244: // The cursor must not point to an existing node Missing `.` to end sentences. src/hotspot/share/utilities/rbTree.hpp line 257: > 255: // For all nodes with key < old_node, must also have key < new_node > 256: // For all nodes with key > old_node, must also have key > new_node > 257: void replace_at_cursor(RBNode* new_node, const Cursor& cursor); `old_key`, `new_key`. Note, if I miss saying this in cpp file: We might want to run the verification code in an assert when this function is called. It's a pretty dangerous function that really requires that you know your stuff :-). src/hotspot/share/utilities/rbTree.hpp line 268: > 266: const Cursor cursor = cursor_find(key); > 267: return cursor.found() ? &cursor.node()->_value : nullptr; > 268: } Are these important to have? src/hotspot/share/utilities/rbTree.hpp line 283: > 281: // Inserts a node with the given key into the tree, > 282: // does nothing if the key already exist. > 283: void upsert(const K& key) { `upsert` is a bad name for function, as it is a portmanteau of "update" and "insert", indicating that this should change something if the key is found. What's the goal with this function? It should probably return the allocated node if it's to be useful. src/hotspot/share/utilities/rbTree.inline.hpp line 192: > 190: template > 191: inline const typename RBTree::Cursor > 192: RBTree::cursor_find(const K& key) const { ```c++ using Tree = RBTree::Cursor; Though I'm pretty sure Thomas gave you a `typedef` for `Tree`, I don't think the `Cur` can be done with a `typedef`. src/hotspot/share/utilities/rbTree.inline.hpp line 207: > 205: } else { > 206: insert_location = &((*insert_location)->_right); > 207: } while (*insert_location != nullptr) { auto find_a_Good_Name = *insert_location; // And replace the usage sites } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1949653015 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1949657034 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1949622460 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1949623575 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1949625408 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1949628390 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1949629951 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1949634506 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1949640519 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1949645253 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1949649069 From jsjolen at openjdk.org Mon Feb 10 18:22:21 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 10 Feb 2025 18:22:21 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v7] In-Reply-To: References: Message-ID: On Mon, 10 Feb 2025 18:13:49 GMT, Johan Sj?len wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> empty base optimization reference > > src/hotspot/share/utilities/rbTree.hpp line 150: > >> 148: // If a cursor is valid (valid() == true) it points somewhere in the tree. >> 149: // If the cursor points to an existing node (found() == true), node() can be used to access that node, >> 150: // Otherwise nullptr is returned, regardless if the node is valid or not. > > Style: "nullptr" should be "null" here. "Otherwise", but the above line has a comma. What can be used to access that node? You don't say :-). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1949654441 From shade at openjdk.org Mon Feb 10 18:53:12 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Feb 2025 18:53:12 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v3] In-Reply-To: <6_AoWQhldJttOIEOL1T7HSapPzE4Qn2j4WN7E-bI3rM=.2685d3d8-e47c-42a6-845b-b68f50cc568e@github.com> References: <6_AoWQhldJttOIEOL1T7HSapPzE4Qn2j4WN7E-bI3rM=.2685d3d8-e47c-42a6-845b-b68f50cc568e@github.com> Message-ID: <5GD87O6WaG7QG9PLlH7ssfGtp1szWUjmosVSl8-TAok=.d04789f7-7f87-4b44-bc76-80676f0f4fc8@github.com> On Thu, 23 Jan 2025 05:45:43 GMT, Cesar Soares Lucas wrote: >> In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. >> >> The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. >> >> The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. >> >> Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. >> >> The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. >> >> Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. >> >> Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge master > - Addressing PR comments: some refactorings, ppc fix, off-by-one fix. > - Relocation of Card Tables I'll take a look at this tomorrow, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23170#issuecomment-2648942403 From aph at openjdk.org Mon Feb 10 20:08:11 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 10 Feb 2025 20:08:11 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic In-Reply-To: References: Message-ID: <8a8bvPO8z1YYMzAn9tz0YbKqxTNe3BMgayjXtoANlmk=.c5ad0edc-6d9b-4275-b394-c8b990cd4dda@github.com> On Mon, 10 Feb 2025 13:33:20 GMT, Amit Kumar wrote: > command: `make test TEST="micro:vm.lang.SecondarySupersLookup" MICRO=" JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:-UseSecondarySupersCache"` This first set of numbers is dominated by C2-compiled code, not C1. This patch is for C1 only, so please delete these results. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23535#issuecomment-2649116429 From nbenalla at openjdk.org Mon Feb 10 20:10:13 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Mon, 10 Feb 2025 20:10:13 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v6] In-Reply-To: <8gan8nfbwoDaaOqqnqpMwcG-XkvvVJFTPwKbTlY5VZ8=.26eeffbb-0702-44ef-bc3b-75abe58d476d@github.com> References: <8gan8nfbwoDaaOqqnqpMwcG-XkvvVJFTPwKbTlY5VZ8=.26eeffbb-0702-44ef-bc3b-75abe58d476d@github.com> Message-ID: <2ok9ZzYuUHkgydwCSgRwW7OrcdMwcP64lOYQVDByNQE=.1bb2460a-b058-4b1d-ad28-b1c2ae1c88fa@github.com> On Mon, 10 Feb 2025 15:24:13 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - pass an empty set the method rather than a boolean > > better exception handling/message when encountering a binary file > - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot > - filter out `.zip` files > - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot > - trivial change, if .java files are filtered out then so should .class files > - revert to the original regex and remove the exclusion of os_windows.cpp > - update based on feedback > - Add a test to prevent NULL backsliding Regarding `globalDefinitions.hpp`, [8324686](https://bugs.openjdk.org/browse/JDK-8324686) is open. The description reads: > One might think that redefinition might no longer be needed at all, since it's working around a pretty old issue. But apparently it's still the case that sizeof(NULL) != sizeof(void*) for 64bit builds. But a literal 0 (a 32bit int) is a valid null pointer constant. This test is prevent more usages of NULL from backsliding, like what happened in [8349417](https://bugs.openjdk.org/browse/JDK-8349417). It would also check test, README and xml files. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23466#issuecomment-2649121543 From liach at openjdk.org Mon Feb 10 20:15:15 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 10 Feb 2025 20:15:15 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v6] In-Reply-To: <2ok9ZzYuUHkgydwCSgRwW7OrcdMwcP64lOYQVDByNQE=.1bb2460a-b058-4b1d-ad28-b1c2ae1c88fa@github.com> References: <8gan8nfbwoDaaOqqnqpMwcG-XkvvVJFTPwKbTlY5VZ8=.26eeffbb-0702-44ef-bc3b-75abe58d476d@github.com> <2ok9ZzYuUHkgydwCSgRwW7OrcdMwcP64lOYQVDByNQE=.1bb2460a-b058-4b1d-ad28-b1c2ae1c88fa@github.com> Message-ID: On Mon, 10 Feb 2025 20:07:32 GMT, Nizar Benalla wrote: >> Nizar Benalla has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - pass an empty set the method rather than a boolean >> >> better exception handling/message when encountering a binary file >> - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot >> - filter out `.zip` files >> - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot >> - trivial change, if .java files are filtered out then so should .class files >> - revert to the original regex and remove the exclusion of os_windows.cpp >> - update based on feedback >> - Add a test to prevent NULL backsliding > > Regarding `globalDefinitions.hpp`, [8324686](https://bugs.openjdk.org/browse/JDK-8324686) is open. The description reads: > >> One might think that redefinition might no longer be needed at all, since it's working around a pretty old issue. But apparently it's still the case that sizeof(NULL) != sizeof(void*) for 64bit builds. But a literal 0 (a 32bit int) is a valid null pointer constant. > > This test is prevent more usages of NULL from backsliding, like what happened in [8349417](https://bugs.openjdk.org/browse/JDK-8349417). It would also check test, README and xml files. @nizarbenalla I believe you misunderstood 8324686: it is that a global definition to support using NULL can be removed, but only after all NULL occurrences are replaced by nullptr. @iklam recommends to add a new global definition that makes future NULL usage fail fast (and can only happen after 8324686, as the macros are incompatible), which IMO is a better approach than directory scanning, and works better if there are extension code to hotspot that lives in directories that can't be discovered by this test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23466#issuecomment-2649131006 From nbenalla at openjdk.org Mon Feb 10 20:20:15 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Mon, 10 Feb 2025 20:20:15 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v6] In-Reply-To: <8gan8nfbwoDaaOqqnqpMwcG-XkvvVJFTPwKbTlY5VZ8=.26eeffbb-0702-44ef-bc3b-75abe58d476d@github.com> References: <8gan8nfbwoDaaOqqnqpMwcG-XkvvVJFTPwKbTlY5VZ8=.26eeffbb-0702-44ef-bc3b-75abe58d476d@github.com> Message-ID: On Mon, 10 Feb 2025 15:24:13 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - pass an empty set the method rather than a boolean > > better exception handling/message when encountering a binary file > - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot > - filter out `.zip` files > - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot > - trivial change, if .java files are filtered out then so should .class files > - revert to the original regex and remove the exclusion of os_windows.cpp > - update based on feedback > - Add a test to prevent NULL backsliding Thanks for the explanation, I now understand what iklam meant ------------- PR Comment: https://git.openjdk.org/jdk/pull/23466#issuecomment-2649140729 From mpowers at openjdk.org Mon Feb 10 21:01:18 2025 From: mpowers at openjdk.org (Mark Powers) Date: Mon, 10 Feb 2025 21:01:18 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 18:47:54 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Adding comments + some code reorganization Some measurements: With Intrinsics --------------- keygen ML-DSA-44 38.8 us/op keygen ML-DSA-65 82.5 us/op keygen ML-DSA-87 112.6 us/op siggen ML-DSA-44 119.1 us/op siggen ML-DSA-65 186.5 us/op siggen ML-DSA-87 306.1 us/op sigver ML-DSA-44 46.4 us/op sigver ML-DSA-65 72.8 us/op sigver ML-DSA-87 123.4 us/op No Intrinsics ------------- keygen ML-DSA-44 63.1 us/op keygen ML-DSA-65 118.7 us/op keygen ML-DSA-87 167.2 us/op siggen ML-DSA-44 466.8 us/op siggen ML-DSA-65 546.3 us/op siggen ML-DSA-87 560.3 us/op sigver ML-DSA-44 71.6 us/op sigver ML-DSA-65 117.9 us/op sigver ML-DSA-87 180.4 us/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2649220775 From iklam at openjdk.org Mon Feb 10 21:02:12 2025 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 10 Feb 2025 21:02:12 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v6] In-Reply-To: <8gan8nfbwoDaaOqqnqpMwcG-XkvvVJFTPwKbTlY5VZ8=.26eeffbb-0702-44ef-bc3b-75abe58d476d@github.com> References: <8gan8nfbwoDaaOqqnqpMwcG-XkvvVJFTPwKbTlY5VZ8=.26eeffbb-0702-44ef-bc3b-75abe58d476d@github.com> Message-ID: <38HudmWmvTaiTKdH_wO4OmXfTL-VSF0QkwnttkOOBvQ=.07104a0e-ba4b-46aa-ab67-eb76d4e00b7a@github.com> On Mon, 10 Feb 2025 15:24:13 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - pass an empty set the method rather than a boolean > > better exception handling/message when encountering a binary file > - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot > - filter out `.zip` files > - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot > - trivial change, if .java files are filtered out then so should .class files > - revert to the original regex and remove the exclusion of os_windows.cpp > - update based on feedback > - Add a test to prevent NULL backsliding I realized that my proposal is much stronger than the script, as it also forbids any 3rd headers included by hotspot from using NULL. If that's not our intention, then I withdraw my proposal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23466#issuecomment-2649222652 From psandoz at openjdk.org Mon Feb 10 21:26:25 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 10 Feb 2025 21:26:25 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v17] In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 10:05:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 9. X86 backend implementation for all supported intrinsics. >> 10. Functional and Performance validation tests. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Fixing typos An impressive and substantial change. I focused on the Java code, there are some small tweaks, presented in comments, we can make to the intrinsics to improve the expression of code, and it has no impact on the intrinsic implementation. src/java.base/share/classes/jdk/internal/vm/vector/Float16Math.java line 32: > 30: * The class {@code Float16Math} constains intrinsic entry points corresponding > 31: * to scalar numeric operations defined in Float16 class. > 32: * @since 25 You can remove this line, since this is an internal class. src/java.base/share/classes/jdk/internal/vm/vector/Float16Math.java line 38: > 36: } > 37: > 38: public interface Float16UnaryMathOp { You can just use `UnaryOperator`, no need for a new type, here are the updated methods you can apply to this class. @FunctionalInterface public interface TernaryOperator { T apply(T a, T b, T c); } @IntrinsicCandidate public static T sqrt(Class box_class, T oa, UnaryOperator defaultImpl) { assert isNonCapturingLambda(defaultImpl) : defaultImpl; return defaultImpl.apply(oa); } @IntrinsicCandidate public static T fma(Class box_class, T oa, T ob, T oc, TernaryOperator defaultImpl) { assert isNonCapturingLambda(defaultImpl) : defaultImpl; return defaultImpl.apply(oa, ob, oc); } static boolean isNonCapturingLambda(Object o) { return o.getClass().getDeclaredFields().length == 0; } And in `src/hotspot/share/classfile/vmIntrinsics.hpp`: /* Float16Math API intrinsification support */ \ /* Float16 signatures */ \ do_signature(float16_unary_math_op_sig, "(Ljava/lang/Class;" \ "Ljava/lang/Object;" \ "Ljava/util/function/UnaryOperator;)" \ "Ljava/lang/Object;") \ do_signature(float16_ternary_math_op_sig, "(Ljava/lang/Class;" \ "Ljava/lang/Object;" \ "Ljava/lang/Object;" \ "Ljava/lang/Object;" \ "Ljdk/internal/vm/vector/Float16Math$TernaryOperator;)" \ "Ljava/lang/Object;") \ do_intrinsic(_sqrt_float16, jdk_internal_vm_vector_Float16Math, sqrt_name, float16_unary_math_op_sig, F_S) \ do_intrinsic(_fma_float16, jdk_internal_vm_vector_Float16Math, fma_name, float16_ternary_math_op_sig, F_S) \ src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java line 1202: > 1200: */ > 1201: public static Float16 sqrt(Float16 radicand) { > 1202: return (Float16) Float16Math.sqrt(Float16.class, radicand, With changes to the intrinsics (as presented in another comment) you no longer need explicit casts and the code is precisely the same as before except embedded in a lambda body: public static Float16 sqrt(Float16 radicand) { return Float16Math.sqrt(Float16.class, radicand, (_radicand) -> { // Rounding path of sqrt(Float16 -> double) -> Float16 is fine // for preserving the correct final value. The conversion // Float16 -> double preserves the exact numerical value. The // conversion of double -> Float16 also benefits from the // 2p+2 property of IEEE 754 arithmetic. return valueOf(Math.sqrt(_radicand.doubleValue())); } ); } Similarly for `fma`: return Float16Math.fma(Float16.class, a, b, c, (_a, _b, _c) -> { // product is numerically exact in float before the cast to // double; not necessary to widen to double before the // multiply. double product = (double)(_a.floatValue() * _b.floatValue()); return valueOf(product + _c.doubleValue()); }); test/jdk/jdk/incubator/vector/ScalarFloat16OperationsTest.java line 44: > 42: import static jdk.incubator.vector.Float16.*; > 43: > 44: public class ScalarFloat16OperationsTest { Now that we have IR tests do you still think this test is necessary or should we have more IR test instead? @eme64 thoughts? We could follow up in another PR if need be. ------------- PR Review: https://git.openjdk.org/jdk/pull/22754#pullrequestreview-2607094727 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1949842011 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1949871647 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1949847574 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1949858554 From gziemski at openjdk.org Mon Feb 10 22:01:53 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 10 Feb 2025 22:01:53 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v25] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: add memory size distribution histogram per NMT categrory ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/d7d79f2d..a229a155 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=23-24 Stats: 39 lines in 1 file changed: 39 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From jiangli at openjdk.org Tue Feb 11 01:03:09 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 11 Feb 2025 01:03:09 GMT Subject: RFR: 8349620: Add VMProps for static JDK In-Reply-To: References: Message-ID: On Mon, 10 Feb 2025 08:21:21 GMT, Alan Bateman wrote: >> Please review this change that adds the `jdk.static` VMProps. It can be used to skip tests not for running on static JDK. >> >> This also adds a new WhiteBox native method, `jdk.test.whitebox.WhiteBox.isStatic()`, which is used by VMProps to determine if it's static at runtime. >> >> `@requires !jdk.static` is added in `test/hotspot/jtreg/runtime/modules/ModulesSymLink.java` to skip running the test on static JDK. This test uses `bin/jlink`, which is not provided on static JDK. There are other tests that require tools in `bin/`. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. > > I think this looks okay, I'm just wondering is one property is enough to cover all the configurations. Thanks, @AlanBateman. > I'm just wondering is one property is enough to cover all the configurations. +1 It's not easy to predict all different cases for now. How about adding/refining when we find any new cases? I'm also wondering if we would want to merge the `isStatic` into `isHermetic` check in the future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2649588820 From dholmes at openjdk.org Tue Feb 11 04:11:11 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 11 Feb 2025 04:11:11 GMT Subject: RFR: 8347833: CrashOnOutOfMemory should stop GC threads before HeapDumpOnOutOfMemoryError In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 18:07:32 GMT, Fairoz Matte wrote: > When CrashOnOutOfMemory and HeapDumpOnOutOfMemoryError invoked together, we should make sure, it is performed in a single safepoint, this will avoid allowing other threads to run and throw OOM errors after the initial one is already under error logging. The code needs adjusting to do what was intended. Thanks src/hotspot/share/utilities/vmError.cpp line 1952: > 1950: if(dumpHeap) { > 1951: HeapDumper::dump_heap_from_oome(); > 1952: } To be done at the same safepoint this code needs to be in `VM_ReportJavaOutOfMemory::doit()` - which is why the `dumpHeap` was to be passed to the `VM_ReportJavaOutOfMemory` constructor and stored in a field for `doit`. test/hotspot/jtreg/runtime/ErrorHandling/TestHeapDumpOnOutOfMemoryAndCrashOnOutOfMemory.java line 27: > 25: * @test TestHeapDumpOnOutOfMemoryAndCrashOnOutOfMemory > 26: * @summary Test verifies call to -XX:HeapDumpOnOutOfMemoryError and > 27: * CrashOnOutOfMemoryError handled in a single safepoint operation I can't see how you can possibly test this other than by having safepoint logging and checking the log output. test/hotspot/jtreg/runtime/ErrorHandling/TestHeapDumpOnOutOfMemoryAndCrashOnOutOfMemory.java line 43: > 41: try { > 42: Object[] oa = new Object[Integer.MAX_VALUE]; > 43: for(int i = 0; i < oa.length; i++) { Suggestion: for (int i = 0; i < oa.length; i++) { test/hotspot/jtreg/runtime/ErrorHandling/TestHeapDumpOnOutOfMemoryAndCrashOnOutOfMemory.java line 44: > 42: Object[] oa = new Object[Integer.MAX_VALUE]; > 43: for(int i = 0; i < oa.length; i++) { > 44: oa[i] = new Object[Integer.MAX_VALUE]; This will throw the "VM limit reached" OOME - does that trigger the heapdump etc processing? test/hotspot/jtreg/runtime/ErrorHandling/TestHeapDumpOnOutOfMemoryAndCrashOnOutOfMemory.java line 57: > 55: OutputAnalyzer output = new OutputAnalyzer(pb.start()); > 56: int exitValue = output.getExitValue(); > 57: if(0 != exitValue) { Suggestion: if (exitValue != 0) { ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23519#pullrequestreview-2607628017 PR Review Comment: https://git.openjdk.org/jdk/pull/23519#discussion_r1950196937 PR Review Comment: https://git.openjdk.org/jdk/pull/23519#discussion_r1950197529 PR Review Comment: https://git.openjdk.org/jdk/pull/23519#discussion_r1950198533 PR Review Comment: https://git.openjdk.org/jdk/pull/23519#discussion_r1950204574 PR Review Comment: https://git.openjdk.org/jdk/pull/23519#discussion_r1950197996 From amitkumar at openjdk.org Tue Feb 11 04:14:28 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 11 Feb 2025 04:14:28 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v2] In-Reply-To: References: Message-ID: > s390x implementation for Class.isInstance intrinsic. > > Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. > > Benchmark results will be updated soon. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: optimize branching ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23535/files - new: https://git.openjdk.org/jdk/pull/23535/files/a764590f..cdca4853 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=00-01 Stats: 12 lines in 1 file changed: 2 ins; 5 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23535/head:pull/23535 PR: https://git.openjdk.org/jdk/pull/23535 From kbarrett at openjdk.org Tue Feb 11 04:38:10 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 11 Feb 2025 04:38:10 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v6] In-Reply-To: <38HudmWmvTaiTKdH_wO4OmXfTL-VSF0QkwnttkOOBvQ=.07104a0e-ba4b-46aa-ab67-eb76d4e00b7a@github.com> References: <8gan8nfbwoDaaOqqnqpMwcG-XkvvVJFTPwKbTlY5VZ8=.26eeffbb-0702-44ef-bc3b-75abe58d476d@github.com> <38HudmWmvTaiTKdH_wO4OmXfTL-VSF0QkwnttkOOBvQ=.07104a0e-ba4b-46aa-ab67-eb76d4e00b7a@github.com> Message-ID: On Mon, 10 Feb 2025 20:59:32 GMT, Ioi Lam wrote: > I realized that my proposal is much stronger than the script, as it also forbids any 3rd headers included by hotspot from using NULL. If that's not our intention, then I withdraw my proposal. Yes, that is the problem with such an approach. Consider, for example, the Google Test framework. I have no idea whether it uses NULL, nullptr, or a mix, and don't want to care. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23466#issuecomment-2649780192 From iklam at openjdk.org Tue Feb 11 05:29:10 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 11 Feb 2025 05:29:10 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 22:32:58 GMT, Calvin Cheung wrote: > This changset refactors CDS class paths and module paths validation code into a new class `AOTCodeSource` and related class `AOTCodeSourceConfig`. Code has been moved from filemap.[c|h]pp, classLoader.[c|h]pp, and classLoaderExt.[c|h]pp to aotCodeSource.[c|h]pp. CDS dependencies have been removed from `classLoader.cpp`. More refactoring could be done, such as removing `classLoaderExt.cpp`, in a future RFE. > > Passed tiers 1 - 5 testing. Looks good. A few suggestions for the in-line comments. src/hotspot/share/cds/aotCodeSource.cpp line 133: > 131: > 132: // AllCodeSourceStreams is used to iterate over all the code sources that > 133: // are available to the application from -Xbootclasspath, -classpath and --module-path Consider adding this comment: // When creating an AOT cache, we store the contents from AllCodeSourceStreams // into an array of AOTCodeSources. See AOTCodeSourceConfig::dumptime_init_helper(). // When loading the AOT cache in a production run, we compare the contents of the // stored AOTCodeSources against the current AllCodeSourceStreams to determine whether // the AOT cache is compatible with the current JVM. See AOTCodeSourceConfig::validate(). src/hotspot/share/cds/aotCodeSource.hpp line 126: > 124: // Non-existent entries are recored during AOTCache creation. Those non-existent entries > 125: // must not exist during runtime. > 126: // Typos: - "subjected to AOTCodeSourceConfig::validate()" -- the function has two parameters, but we can omit them in this comment - "validation is performed on *the* AOTCodeSources" - "during AOTCache creation *are* the same" - "on-existent entries are *recorded*" ------------- PR Review: https://git.openjdk.org/jdk/pull/23476#pullrequestreview-2607702313 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1950250159 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1950244507 From jbhateja at openjdk.org Tue Feb 11 06:32:56 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 11 Feb 2025 06:32:56 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v18] In-Reply-To: References: Message-ID: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 9. X86 backend implementation for all supported intrinsics. > 10. Functional and Performance validation tests. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22754/files - new: https://git.openjdk.org/jdk/pull/22754/files/82a42213..111c8084 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22754&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22754&range=16-17 Stats: 38 lines in 3 files changed: 2 ins; 11 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/22754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22754/head:pull/22754 PR: https://git.openjdk.org/jdk/pull/22754 From jbhateja at openjdk.org Tue Feb 11 06:32:56 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 11 Feb 2025 06:32:56 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v17] In-Reply-To: References: Message-ID: On Mon, 10 Feb 2025 20:43:19 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixing typos > > test/jdk/jdk/incubator/vector/ScalarFloat16OperationsTest.java line 44: > >> 42: import static jdk.incubator.vector.Float16.*; >> 43: >> 44: public class ScalarFloat16OperationsTest { > > Now that we have IR tests do you still think this test is necessary or should we have more IR test instead? @eme64 thoughts? We could follow up in another PR if need be. Hi Paul, DataProviders used in this Functional validation test exercises each newly added Float16 operation over entire value range, while our IR tests are more directed towards valdating the newly added IR transforms and constant folding scenarios. We have a follow-up PR for auto-vectorizing Float16 operation which can be used to beefup any validation gap. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1950290083 From sroy at openjdk.org Tue Feb 11 07:11:49 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Tue, 11 Feb 2025 07:11:49 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v21] In-Reply-To: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: <4WQI7dkNmZrC6iHWbq7y9n_unOzeKCjmLVrCycy9q-w=.1a9f9e6d-c264-498d-ba47-0e899df4ad53@github.com> > JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) > > Currently acceleration code for GHASH is missing for PPC64. > > The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: - common code function - Aligned accesses ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20235/files - new: https://git.openjdk.org/jdk/pull/20235/files/d22fcf25..12723751 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=19-20 Stats: 92 lines in 2 files changed: 44 ins; 37 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20235/head:pull/20235 PR: https://git.openjdk.org/jdk/pull/20235 From sroy at openjdk.org Tue Feb 11 07:15:27 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Tue, 11 Feb 2025 07:15:27 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v22] In-Reply-To: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: > JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) > > Currently acceleration code for GHASH is missing for PPC64. > > The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: - common code function - common code function ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20235/files - new: https://git.openjdk.org/jdk/pull/20235/files/12723751..a7d9a960 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=20-21 Stats: 7 lines in 2 files changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20235/head:pull/20235 PR: https://git.openjdk.org/jdk/pull/20235 From aph at openjdk.org Tue Feb 11 08:02:10 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 11 Feb 2025 08:02:10 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v2] In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 04:14:28 GMT, Amit Kumar wrote: >> s390x implementation for Class.isInstance intrinsic. >> >> Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. >> >> Benchmark results will be updated soon. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > optimize branching src/hotspot/cpu/s390/c1_Runtime1_s390.cpp line 639: > 637: int i = 0; > 638: __ z_stg(Z_tmp_1, (i++)*BytesPerWord + frame::z_abi_160_size, Z_SP); > 639: __ z_stg(Z_tmp_2, (i++)*BytesPerWord + frame::z_abi_160_size, Z_SP); Suggestion: __ z_stg(Z_tmp_1, 0*BytesPerWord + frame::z_abi_160_size, Z_SP); __ z_stg(Z_tmp_2, 1*BytesPerWord + frame::z_abi_160_size, Z_SP); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1950386385 From alanb at openjdk.org Tue Feb 11 08:13:09 2025 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 11 Feb 2025 08:13:09 GMT Subject: RFR: 8349620: Add VMProps for static JDK In-Reply-To: References: Message-ID: On Mon, 10 Feb 2025 08:21:21 GMT, Alan Bateman wrote: >> Please review this change that adds the `jdk.static` VMProps. It can be used to skip tests not for running on static JDK. >> >> This also adds a new WhiteBox native method, `jdk.test.whitebox.WhiteBox.isStatic()`, which is used by VMProps to determine if it's static at runtime. >> >> `@requires !jdk.static` is added in `test/hotspot/jtreg/runtime/modules/ModulesSymLink.java` to skip running the test on static JDK. This test uses `bin/jlink`, which is not provided on static JDK. There are other tests that require tools in `bin/`. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. > > I think this looks okay, I'm just wondering is one property is enough to cover all the configurations. > Thanks, @AlanBateman. > > > I'm just wondering is one property is enough to cover all the configurations. > > +1 > > It's not easy to predict all different cases for now. How about adding/refining when we find any new cases? That's okay with me. I'm hoping Magnus will jump in when he gets a chance as he has experience with the "other" static build configurations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2650077629 From aph at openjdk.org Tue Feb 11 08:16:11 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 11 Feb 2025 08:16:11 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v2] In-Reply-To: References: Message-ID: <7OKuCkES0raH8nB6Rm_iIclqtFv12hOvl_-ljMhUGcQ=.dd5aa891-5b73-4d83-9071-e92fbfb477e2@github.com> On Tue, 11 Feb 2025 04:14:28 GMT, Amit Kumar wrote: >> s390x implementation for Class.isInstance intrinsic. >> >> Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. >> >> Benchmark results will be updated soon. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > optimize branching src/hotspot/cpu/s390/c1_Runtime1_s390.cpp line 596: > 594: // Mirror: Z_ARG1(R2) > 595: // Object: Z_ARG2 > 596: // Temps: Z_ARG3, Z_ARG4, Z_ARG5, Z_tmp_1, Z_tmp_2 Suggestion: // Temps: Z_ARG3, Z_ARG4, Z_ARG5, Z_R11, Z_R12 // Z_R11 and Z_R12 are call saved, so we must push them before any use src/hotspot/cpu/s390/c1_Runtime1_s390.cpp line 601: > 599: // Get the Klass* into Z_ARG3 > 600: Register klass = Z_ARG3 , obj = Z_ARG2, result = Z_RET; > 601: Register temp0 = Z_ARG4, temp1 = Z_ARG5, temp2 = Z_tmp_1, temp3 = Z_tmp_2; Suggestion: Register temp0 = Z_ARG4, temp1 = Z_ARG5, temp2 = Z_R11, temp3 = Z_R12; Aliasing `temp2` to `Z_tmp_1` is just too confusing to the reader. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1950401236 PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1950399763 From amitkumar at openjdk.org Tue Feb 11 08:33:42 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 11 Feb 2025 08:33:42 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v3] In-Reply-To: References: Message-ID: > s390x implementation for Class.isInstance intrinsic. > > Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. > > Benchmark results will be updated soon. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: suggestions from Andrew ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23535/files - new: https://git.openjdk.org/jdk/pull/23535/files/cdca4853..73ea8ec7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=01-02 Stats: 11 lines in 1 file changed: 1 ins; 1 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23535/head:pull/23535 PR: https://git.openjdk.org/jdk/pull/23535 From amitkumar at openjdk.org Tue Feb 11 08:33:42 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 11 Feb 2025 08:33:42 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v2] In-Reply-To: <7OKuCkES0raH8nB6Rm_iIclqtFv12hOvl_-ljMhUGcQ=.dd5aa891-5b73-4d83-9071-e92fbfb477e2@github.com> References: <7OKuCkES0raH8nB6Rm_iIclqtFv12hOvl_-ljMhUGcQ=.dd5aa891-5b73-4d83-9071-e92fbfb477e2@github.com> Message-ID: On Tue, 11 Feb 2025 08:11:51 GMT, Andrew Haley wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> optimize branching > > src/hotspot/cpu/s390/c1_Runtime1_s390.cpp line 601: > >> 599: // Get the Klass* into Z_ARG3 >> 600: Register klass = Z_ARG3 , obj = Z_ARG2, result = Z_RET; >> 601: Register temp0 = Z_ARG4, temp1 = Z_ARG5, temp2 = Z_tmp_1, temp3 = Z_tmp_2; > > Suggestion: > > Register temp0 = Z_ARG4, temp1 = Z_ARG5, temp2 = Z_R11, temp3 = Z_R12; > > Aliasing `temp2` to `Z_tmp_1` is just too confusing to the reader. Updated. `Z_tmp_1` and `Z_tmp_2` refers to `Z_R10` and `Z_R11` respectively, So I have update the patch according to that. constexpr Register Z_tmp_1 = Z_R10; constexpr Register Z_tmp_2 = Z_R11; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1950421440 From aph at openjdk.org Tue Feb 11 08:39:11 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 11 Feb 2025 08:39:11 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v3] In-Reply-To: References: Message-ID: <0BgHb9SWIAMh-E0RJQOxnJtAk5NcnyECujPPoizDRaU=.1d267523-c87b-43ea-8449-c47a1a681318@github.com> On Tue, 11 Feb 2025 08:33:42 GMT, Amit Kumar wrote: >> s390x implementation for Class.isInstance intrinsic. >> >> Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. >> >> Benchmark results will be updated soon. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestions from Andrew src/hotspot/cpu/s390/c1_Runtime1_s390.cpp line 650: > 648: // lookup_secondary_supers_table_var return 0 on success and 1 on failure. > 649: // but this method returns 0 on failure and 1 on success. > 650: // so we have to inverse the result we got from lookup_secondary_supers_table_var. Suggestion: // so we have to invert the result from lookup_secondary_supers_table_var. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1950429772 From amitkumar at openjdk.org Tue Feb 11 08:47:24 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 11 Feb 2025 08:47:24 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v4] In-Reply-To: References: Message-ID: > s390x implementation for Class.isInstance intrinsic. > > Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. > > Benchmark results will be updated soon. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: update comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23535/files - new: https://git.openjdk.org/jdk/pull/23535/files/73ea8ec7..33d5cbe3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23535/head:pull/23535 PR: https://git.openjdk.org/jdk/pull/23535 From dholmes at openjdk.org Tue Feb 11 09:32:13 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 11 Feb 2025 09:32:13 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: <_CnY-j8qQhI5hEydYYH1gfQQP909-QrWTboS79F6UHA=.cf2527c7-5a81-4e4d-8433-ce18f9d63982@github.com> On Wed, 5 Feb 2025 14:41:39 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker Sorry still on my to-do list. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2650249785 From bkilambi at openjdk.org Tue Feb 11 10:43:22 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 11 Feb 2025 10:43:22 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: Message-ID: <1yB95sOajuS5ptFI0GQWLepii5JsZ9DOsje-TEFyFYs=.a325ad18-17ed-4e77-b1e3-0bad2cf55c67@github.com> On Thu, 6 Feb 2025 18:47:54 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Adding comments + some code reorganization src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2618: > 2616: INSN(smaxp, 0, 0b101001, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S > 2617: INSN(sminp, 0, 0b101011, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S > 2618: INSN(sqdmulh,0, 0b101101, false); // accepted arrangements: T4H, T8H, T2S, T4S Hi, not a comment on the algorithm itself but you might have to add these new instructions in the gtest for aarch64 here - test/hotspot/gtest/aarch64/aarch64-asmtest.py and use this file to generate test/hotspot/gtest/aarch64/asmtest.out.h which would contain these newly added instructions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1950610623 From mdoerr at openjdk.org Tue Feb 11 10:55:15 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 11 Feb 2025 10:55:15 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v22] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Tue, 11 Feb 2025 07:15:27 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: > > - common code function > - common code function src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 692: > 690: __ b(L_unaligned_loop); > 691: __ bind(L_aligned_loop); > 692: __ vspltisb(vZero, 0); This can be moved out of the loop, now. It's no longer modified in the loop. src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 703: > 701: __ lvx(vHigh, temp1, data); > 702: __ addi(data, data, 16); > 703: __ lvx(vLow, temp1, data); 2 loads in the loop can be avoided by loading once before the loop and keeping the previous 16 Bytes in a register. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1950624036 PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1950626325 From azafari at openjdk.org Tue Feb 11 13:39:06 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 11 Feb 2025 13:39:06 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v22] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. > - Adding a runtime flag for selecting the old or new version can be added later. > - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: removed vmtCommon.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20425/files - new: https://git.openjdk.org/jdk/pull/20425/files/873d5355..35c11b96 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=20-21 Stats: 983 lines in 16 files changed: 447 ins; 519 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From aph at openjdk.org Tue Feb 11 13:58:18 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 11 Feb 2025 13:58:18 GMT Subject: RFR: 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands [v2] In-Reply-To: References: Message-ID: <8t_Z2acW3fMegjh1OmqeEEEbZ9inBFkjyRKJvgpMewY=.5cf85cdd-8675-4ed0-b32c-4c65a685240f@github.com> On Wed, 8 Jan 2025 13:49:33 GMT, Fei Gao wrote: >> `IndOffXX` types don't do us any good. It would be simpler and faster to match a general-purpose `IndOff` type then let `legitimize_address()` fix any out-of-range operands. That'd reduce the size of the match rules and the time to run them. >> >> This patch simplifies the definitions of `immXOffset` with an estimated range. Whether an immediate can be encoded in a `LDR`/`STR` instructions as an offset will be determined in the phase of code-emitting. Meanwhile, we add necessary `legitimize_address()` in the phase of matcher for all `LDR`/`STR` instructions using the new `IndOff` memory operands (fix [JDK-8341437](https://bugs.openjdk.org/browse/JDK-8341437)). >> >> After this clean-up, memory operands matched with `IndOff` may require extra code emission (effectively a `lea`) before the address can be used. So we also modify the code about looking up precise offset of load/store instruction for implicit null check (fix [JDK-8340646](https://bugs.openjdk.org/browse/JDK-8340646)). On `aarch64` platform, we will use the beginning offset of the last instruction in the instruction clause emitted for a load/store machine node. Because `LDR`/`STR` is always the last one emitted, no matter what addressing mode the load/store operations finally use. >> >> Tier 1 - 3 passed on `Macos-aarch64` with or without the vm option `-XX:+UseZGC`. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Update the copyright year and code comments > - Merge branch 'master' into cleanup_indoff > - 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands > > IndOffXX types don't do us any good. It would be simpler and > faster to match a general-purpose IndOff type then let > legitimize_address() fix any out-of-range operands. That'd > reduce the size of the match rules and the time to run them. > > This patch simplifies the definitions of `immXOffset` with an > estimated range. Whether an immediate can be encoded in a > LDR/STR instructions as an offset will be determined in the phase > of code-emitting. Meanwhile, we add necessary > `legitimize_address()` in the phase of matcher for all LDR/STR > instructions using the new `IndOff` memory operands > (fix JDK-8341437). > > After this clean-up, memory operands matched with `IndOff` may > require extra code emission (effectively a lea) before the address > can be used. So we also modify the code about looking up precise > offset of load/store instruction for implicit null check > (fix JDK-8340646). On aarch64 platform, we will use the beginning > offset of the last instruction in the instruction clause emitted > for a load/store machine node. Because LDR/STR is always the last > one emitted, no matter what addressing mode the load/store > operations finally use. > > Tier 1 - 3 passed on Macos-aarch64 with or without the vm option > "-XX:+UseZGC" Is this still alive? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22862#issuecomment-2650898063 From cnorrbin at openjdk.org Tue Feb 11 16:15:05 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 11 Feb 2025 16:15:05 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v7] In-Reply-To: References: Message-ID: On Mon, 10 Feb 2025 17:52:43 GMT, Johan Sj?len wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> empty base optimization reference > > src/hotspot/share/utilities/rbTree.hpp line 227: > >> 225: // Gets the cursor to the given node. >> 226: Cursor get_cursor(const RBNode* node); >> 227: const Cursor get_cursor(const RBNode* node) const; > > How about `cursor_of`, or just `cursor`? Renamed both this and `cursor_find` to `cursor` > src/hotspot/share/utilities/rbTree.hpp line 257: > >> 255: // For all nodes with key < old_node, must also have key < new_node >> 256: // For all nodes with key > old_node, must also have key > new_node >> 257: void replace_at_cursor(RBNode* new_node, const Cursor& cursor); > > `old_key`, `new_key`. Note, if I miss saying this in cpp file: We might want to run the verification code in an assert when this function is called. It's a pretty dangerous function that really requires that you know your stuff :-). We already call `verify_self` in the implementation :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1951144934 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1951146727 From cnorrbin at openjdk.org Tue Feb 11 16:15:02 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 11 Feb 2025 16:15:02 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v8] In-Reply-To: References: Message-ID: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: johan feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23416/files - new: https://git.openjdk.org/jdk/pull/23416/files/174d169f..48241078 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=06-07 Stats: 120 lines in 3 files changed: 3 ins; 3 del; 114 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From cnorrbin at openjdk.org Tue Feb 11 16:22:17 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 11 Feb 2025 16:22:17 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v7] In-Reply-To: References: Message-ID: <0BWlGpv_dZ8-e4cfXjvwbROzMozugwu-Xsi1os2sgxM=.f56e2d84-6ec7-499e-9026-4922f59b672e@github.com> On Mon, 10 Feb 2025 18:01:59 GMT, Johan Sj?len wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> empty base optimization reference > > src/hotspot/share/utilities/rbTree.hpp line 268: > >> 266: const Cursor cursor = cursor_find(key); >> 267: return cursor.found() ? &cursor.node()->_value : nullptr; >> 268: } > > Are these important to have? Depends on what you mean by important. I think it's useful for those who want to interact with the tree on a higher level, by for example only using `upsert(k,v)`, `remove(k)`, and `find(k)`. Of course, I could remove this and force people to use cursors instead, but that feels unnecessary, especially if they don't need cursors otherwise. > src/hotspot/share/utilities/rbTree.hpp line 283: > >> 281: // Inserts a node with the given key into the tree, >> 282: // does nothing if the key already exist. >> 283: void upsert(const K& key) { > > `upsert` is a bad name for function, as it is a portmanteau of "update" and "insert", indicating that this should change something if the key is found. What's the goal with this function? It should probably return the allocated node if it's to be useful. This follows the pattern of the other value-less functions keeping the name, but I agree that it's less logical here. Would one `insert(k)` and one `upsert(k, v)` be a good solution? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1951160193 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1951163593 From cnorrbin at openjdk.org Tue Feb 11 16:25:13 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 11 Feb 2025 16:25:13 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v7] In-Reply-To: References: Message-ID: <6njeWH4LDMWeybiM56amjO74FSTCeD58_hobFb6Q9uo=.b43e65d2-9fb7-4d41-80ac-4a8cddc88f9c@github.com> On Mon, 10 Feb 2025 18:07:29 GMT, Johan Sj?len wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> empty base optimization reference > > src/hotspot/share/utilities/rbTree.inline.hpp line 192: > >> 190: template >> 191: inline const typename RBTree::Cursor >> 192: RBTree::cursor_find(const K& key) const { > > ```c++ > using Tree = RBTree using Cur = typename RBTree::Cursor; > > > Though I'm pretty sure Thomas gave you a `typedef` for `Tree`, I don't think the `Cur` can be done with a `typedef`. I don't think we can access `TreeType` here in any meaningful way, since that is a part of the tree class and we're outside the class here, and we would still need to specify the template parameters. But please correct me if I'm wrong :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1951169370 From cnorrbin at openjdk.org Tue Feb 11 16:33:12 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 11 Feb 2025 16:33:12 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v7] In-Reply-To: References: Message-ID: On Mon, 10 Feb 2025 18:17:05 GMT, Johan Sj?len wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> empty base optimization reference > > src/hotspot/share/utilities/rbTree.hpp line 166: > >> 164: bool found() const { return *_insert_location != nullptr; } >> 165: RBNode* node() { return _insert_location == nullptr ? nullptr : *_insert_location; } >> 166: RBNode* node() const { return _insert_location == nullptr ? nullptr : *_insert_location; } > > Is there any case where I don't need to check the validity of the cursor? That is, do I ever want to use `node()` without first calling `valid()` or afterwards checking whether the returned value was null? > > If the answer to that is: "No, there is no such case", then we shouldn't return null on `!valid()` node. We should instead add `assert(valid(), "must be");". If the answer is yes, then could you please tell me what that situation is :P? It's been useful in a couple of places, where we want "the node or nullptr otherwise", since you get the valid check and the node in one. A few examples where we don't check validity: In `visit_range_in_order`, we iterate until the node (or nullptr) is reached. In `upsert`, we extract the node node into a local variable and either modify the node or reuse the variable. In `closest_gt`, we return the next node from the cursor, which could be invalid. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1951182600 From jsjolen at openjdk.org Tue Feb 11 16:33:13 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Feb 2025 16:33:13 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v7] In-Reply-To: <0BWlGpv_dZ8-e4cfXjvwbROzMozugwu-Xsi1os2sgxM=.f56e2d84-6ec7-499e-9026-4922f59b672e@github.com> References: <0BWlGpv_dZ8-e4cfXjvwbROzMozugwu-Xsi1os2sgxM=.f56e2d84-6ec7-499e-9026-4922f59b672e@github.com> Message-ID: On Tue, 11 Feb 2025 16:19:28 GMT, Casper Norrbin wrote: >> src/hotspot/share/utilities/rbTree.hpp line 283: >> >>> 281: // Inserts a node with the given key into the tree, >>> 282: // does nothing if the key already exist. >>> 283: void upsert(const K& key) { >> >> `upsert` is a bad name for function, as it is a portmanteau of "update" and "insert", indicating that this should change something if the key is found. What's the goal with this function? It should probably return the allocated node if it's to be useful. > > This follows the pattern of the other value-less functions keeping the name, but I agree that it's less logical here. Would one `insert(k)` and one `upsert(k, v)` be a good solution? Yes! >> src/hotspot/share/utilities/rbTree.inline.hpp line 192: >> >>> 190: template >>> 191: inline const typename RBTree::Cursor >>> 192: RBTree::cursor_find(const K& key) const { >> >> ```c++ >> using Tree = RBTree> using Cur = typename RBTree::Cursor; >> >> >> Though I'm pretty sure Thomas gave you a `typedef` for `Tree`, I don't think the `Cur` can be done with a `typedef`. > > I don't think we can access `TreeType` here in any meaningful way, since that is a part of the tree class and we're outside the class here, and we would still need to specify the template parameters. But please correct me if I'm wrong :) I don't know :-), you might be right! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1951180943 PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1951181638 From rehn at openjdk.org Tue Feb 11 17:31:41 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 11 Feb 2025 17:31:41 GMT Subject: RFR: 8349851: RISCV: Call VM leaf can use movptr2 Message-ID: Hi, please consider. There should be a small speed up to vm leafs. We can scratch t2 here as we just pushed it. (maybe I should have used t0) Passes t1 /Robbin ------------- Commit messages: - Changes Changes: https://git.openjdk.org/jdk/pull/23565/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23565&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349851 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23565/head:pull/23565 PR: https://git.openjdk.org/jdk/pull/23565 From lucy at openjdk.org Tue Feb 11 18:42:11 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 11 Feb 2025 18:42:11 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v4] In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 08:47:24 GMT, Amit Kumar wrote: >> s390x implementation for Class.isInstance intrinsic. >> >> Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. >> >> Benchmark results will be updated soon. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > update comment Looks good overall. Some details need to be addressed. src/hotspot/cpu/s390/c1_Runtime1_s390.cpp line 639: > 637: __ push_frame(frame_size); > 638: > 639: // Z_R10 and Z_R11 are call saved, so we must push them before any use pls write **caller** saved. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3674: > 3672: Register r_temp2, > 3673: Register r_temp3) { > 3674: assert_different_registers(r_sub_klass, r_super_klass, r_result, r_temp1, r_temp2, r_temp3, Z_R0_scratch); Better use `LOCGHI` further down and avoid use of `Z_R0_scratch`. You are using `LOCHI` in `Runtime1::generate_code_for()` anyway which implies you are sure the load/store-on-condition facility 2 is installed. In other words: is z13 the minimum H/W version? Even more simplification: there is no need to set `r_linear_result` conditionally. You set it to 1 and branch to failure if array length is zero. For all other cases, repne_scan() does the right thing. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3720: > 3718: z_lg(Z_ARG2, -16, Z_SP); // r_sub_klass > 3719: z_lg(Z_ARG3, -24, Z_SP); // r_linear_result > 3720: This argument shuffle implementation works well in 99.99% of all situations. Maybe it's even more reliable. BUT: you are using stack space which is outside the bounds of used (and thus protected) stack space. If your thread is catching an interrupt (a signal, for example), the interrupt handler will place its data just beyond Z_SP. You MUST resize the frame to make room for the needed spill space. ------------- Changes requested by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23535#pullrequestreview-2609555124 PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1951378397 PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1951320369 PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1951360524 From eosterlund at openjdk.org Tue Feb 11 20:38:40 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 11 Feb 2025 20:38:40 GMT Subject: RFR: 8347335: ZGC: Use limitless mark stack memory Message-ID: <9h8RYyi02b9Hz6EGoef3tCHmAHYpB8bdgyUiXkZeC0s=.25738b1a-f580-4c1a-a9cc-f76dbc03bc1e@github.com> When ZGC performs marking, a lock-free data structure is used to keep track of objects that still need to be traced in the object traversal. This lock-free data structure uses versioned pointer as a technique to avoid ABA problems, prevalent when writing lock-free data structures. This required partitioning pointers in the structure to embed both a version and a location. Due to the reduced addressability of locations with only a portion of the pointer bits, a special memory space was created to manage the data structure such that offsets could be encoded, instead of addresses. Since the memory area needs to be contiguous, the JVM needs to know what the expected maximum size of this space will ever be, within some limiting bounds. That is what `-XX:ZMarkStackSpaceLimit` controls. While this strategy has worked well in practice, the design does limit the scalability of ZGC, due to limits in how much contiguous memory can be encoded with a subset of the pointer bits. Not to mention that users have no idea what number to put in to this JVM option. The `-XX:ZMarkStackSpaceLimit` JVM option is needed due to using a contiguous allocator to solve an ABA problem in a lock-free data structure. By selecting another solution for the ABA problem, the need for the special contiguous memory allocator and hence the JVM option can be removed. This PR proposes a new solution for that original ABA problem in the lock-free data structure, which renders the entire machinery behind the `-XX:ZMarkStackSpaceLimit` JVM option redundant. The proposed technique is to use hazard pointers instead. The use of hazard pointers is a well established safe memory reclamation (SMR) technique for writing lock-free data structures, that we also use in the Threads list. The main idea is to publish what pointer has been read with a hazard pointer, so that concurrent threads know not to free memory that is being concurrently used. Freeing of such racingly accessed memory is deferred until it is safe, hence solving the ABA problem. This also allows using plain malloc/free instead of a custom contiguous memory allocator for these structures. Only popping nodes from the mark stacks requires hazard pointers, and only GC workers pop entries from the mark stacks. Therefore, hazard pointers may be stored in a per-worker variable. I have measured throughput, latency, marking times and memory usage across a number of programs and platforms, and not seen any interesting changes in the behavior, other than having a more predictable and consistent native memory usage, instead of the slightly more temperamental behavior that we have today due to eagerly handing the mark stack memory back to the OS between GC cycles, while requiring it all back the next cycle. With this change, another JVM option bites the dust. I have already gotten the CSR to obsolete the `-XX:ZMarkStackSpaceLimit` JVM option approved (cf. https://bugs.openjdk.org/browse/JDK-8349204). ------------- Commit messages: - 8347335: ZGC: Use limitless mark stack memory Changes: https://git.openjdk.org/jdk/pull/23571/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23571&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347335 Stats: 1189 lines in 26 files changed: 418 ins; 656 del; 115 mod Patch: https://git.openjdk.org/jdk/pull/23571.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23571/head:pull/23571 PR: https://git.openjdk.org/jdk/pull/23571 From gziemski at openjdk.org Tue Feb 11 21:28:39 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 11 Feb 2025 21:28:39 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v26] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: memory histograms were shown reversed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/a229a155..dc1285fe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=24-25 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From vpaprotski at openjdk.org Tue Feb 11 21:47:31 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 11 Feb 2025 21:47:31 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v6] In-Reply-To: References: Message-ID: > (Also see `8319429: Resetting MXCSR flags degrades ecore`) > > This PR fixes two issues: > - the original issue is a crash caused by `__ warn` corrupting the stack on Windows only > - This issue also uncovered that -Xcheck:jni test cases were getting 65k lines of warning on HelloWorld (on both Linux _and_ windows): > > OpenJDK 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall > > > First, the crash. Caused when FXRSTOR is attempting to write reserved bits into MXCSR. If those bits happen to be set, crash. (Hence the crash isn't deterministic. But frequent enough if `__ warn` is used). It is caused by the binding not reserving stack space for register parameters () > ![image](https://github.com/user-attachments/assets/4ad63908-088b-4e9d-9e7d-a3509bee046a) > Prolog of the warn function then proceeds to store the for arg registers onto the stack, overriding the fxstore save area. (See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#calling-convention-defaults) > > Fix uses `frame::arg_reg_save_area_bytes` to bump the stack pointer. > > --- > > I also kept the fix to `verify_mxcsr` since without it, `-Xcheck:jni` is practically unusable when `-XX:+EnableX86ECoreOpts` are set (65k+ lines of warnings) Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: comments from Sandhya ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22673/files - new: https://git.openjdk.org/jdk/pull/22673/files/2e372f29..b23764ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22673&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22673&range=04-05 Stats: 5 lines in 2 files changed: 1 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22673.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22673/head:pull/22673 PR: https://git.openjdk.org/jdk/pull/22673 From gziemski at openjdk.org Tue Feb 11 21:49:56 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 11 Feb 2025 21:49:56 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v27] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: do not add NMT to malloc overhead ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/dc1285fe..ad859630 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=25-26 Stats: 15 lines in 1 file changed: 4 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From gziemski at openjdk.org Tue Feb 11 22:46:31 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 11 Feb 2025 22:46:31 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v28] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/ad859630..81135a14 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=26-27 Stats: 8 lines in 1 file changed: 0 ins; 7 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From sviswanathan at openjdk.org Tue Feb 11 23:46:13 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 11 Feb 2025 23:46:13 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v6] In-Reply-To: References: Message-ID: <2dkEcQAbzdcH-cajmzsDbpdwdIGIzRY9PeqPG7xG2oE=.38f8d823-cd20-4c00-af84-29982d755b30@github.com> On Tue, 11 Feb 2025 21:47:31 GMT, Volodymyr Paprotski wrote: >> (Also see `8319429: Resetting MXCSR flags degrades ecore`) >> >> This PR fixes two issues: >> - the original issue is a crash caused by `__ warn` corrupting the stack on Windows only >> - This issue also uncovered that -Xcheck:jni test cases were getting 65k lines of warning on HelloWorld (on both Linux _and_ windows): >> >> OpenJDK 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall >> >> >> First, the crash. Caused when FXRSTOR is attempting to write reserved bits into MXCSR. If those bits happen to be set, crash. (Hence the crash isn't deterministic. But frequent enough if `__ warn` is used). It is caused by the binding not reserving stack space for register parameters () >> ![image](https://github.com/user-attachments/assets/4ad63908-088b-4e9d-9e7d-a3509bee046a) >> Prolog of the warn function then proceeds to store the for arg registers onto the stack, overriding the fxstore save area. (See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#calling-convention-defaults) >> >> Fix uses `frame::arg_reg_save_area_bytes` to bump the stack pointer. >> >> --- >> >> I also kept the fix to `verify_mxcsr` since without it, `-Xcheck:jni` is practically unusable when `-XX:+EnableX86ECoreOpts` are set (65k+ lines of warnings) > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comments from Sandhya Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22673#pullrequestreview-2610374929 From gziemski at openjdk.org Wed Feb 12 00:00:46 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 12 Feb 2025 00:00:46 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v29] In-Reply-To: References: Message-ID: <6gYS8OC46Nck8bz5ZDfmNnM7iLagSQ89rcSSQY4JUug=.0eacd539-b493-4896-a7f9-b0c19c73fe98@github.com> > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: only account for active NMT headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/81135a14..3a1c3b20 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=27-28 Stats: 8 lines in 1 file changed: 3 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From liach at openjdk.org Wed Feb 12 00:03:13 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 12 Feb 2025 00:03:13 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v7] In-Reply-To: References: Message-ID: <2lG9VhKEsE-D5TAEt0bwzEBbi8m4CeFHYXVaCa2ihz4=.8394ff99-d47a-4761-8e5b-f740daea92ea@github.com> On Mon, 10 Feb 2025 13:23:49 GMT, Coleen Phillimore wrote: >> This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into protection-domain > - Move test for protectionDomain filtering. > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Remove @Stable annotation for final field. > - Fix test that knows which fields are hidden from reflection in jvmci. > - Hide Class.protectionDomain for reflection and add a test case. > - Merge branch 'master' into protection-domain > - Fix two tests. > - Fix the test. > - ... and 1 more: https://git.openjdk.org/jdk/compare/c9cadbd2...2208302c The updated java code looks good. Need other engineers to look at hotspot changes. ------------- Marked as reviewed by liach (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23396#pullrequestreview-2610411471 From dlong at openjdk.org Wed Feb 12 01:18:03 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 12 Feb 2025 01:18:03 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash Message-ID: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. ------------- Commit messages: - fix - tighten upper-bound on locals assert - s390 build - update bug id, copyright, in test - s390 build - ppc build - wip Changes: https://git.openjdk.org/jdk/pull/23557/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23557&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336042 Stats: 142 lines in 8 files changed: 133 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23557.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23557/head:pull/23557 PR: https://git.openjdk.org/jdk/pull/23557 From dlong at openjdk.org Wed Feb 12 01:18:04 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 12 Feb 2025 01:18:04 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash In-Reply-To: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: <1VtIizP7DYsEPervTMwvNJxv0UTKHj5vR8x48Sq43ks=.46017686-9b49-4f16-afb8-a83564bcdb2f@github.com> On Tue, 11 Feb 2025 07:59:01 GMT, Dean Long wrote: > When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. > > In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. > > Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. src/hotspot/share/runtime/deoptimization.cpp line 754: > 752: int caller_actual_parameters = -1; // value not used except for interpreted frames, see below > 753: if (deopt_sender.is_interpreted_frame()) { > 754: caller_actual_parameters = callee_parameters + (caller_was_method_handle ? 1 : 0); Previously, if caller_was_method_handle was set, we would pass in 0 below, which was wrong for the has_member_arg case, and I suspect it broke JVMTI PopFrame for platforms that don't use popframe_move_outgoing_args, but I don't have a test for this suspicion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1951795243 From dholmes at openjdk.org Wed Feb 12 02:51:12 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 12 Feb 2025 02:51:12 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Wed, 5 Feb 2025 14:41:39 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker @albertnetymk I think that to get the correct "dekker duality" in this code you do need to have full fences between the stores and loads, not just a `storeload` barrier. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2610698148 From fyang at openjdk.org Wed Feb 12 03:00:18 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 12 Feb 2025 03:00:18 GMT Subject: RFR: 8349851: RISCV: Call VM leaf can use movptr2 In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 17:26:28 GMT, Robbin Ehn wrote: > Hi, please consider. > > There should be a small speed up to vm leafs. > We can scratch t2 here as we just pushed it. (maybe I should have used t0) > > Passes t1 > > /Robbin src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 797: > 795: push_reg(RegSet::of(t1, xmethod), sp); // push << t1 & xmethod >> to sp > 796: > 797: movptr(t1, entry_point, offset, t2); Personally, I prefer `movptr2(t1, entry_point, offset, t0);` which will be consistent with the one in the preceding assembler routine `MacroAssembler::emit_static_call_stub()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23565#discussion_r1951907146 From amitkumar at openjdk.org Wed Feb 12 03:48:09 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 12 Feb 2025 03:48:09 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash In-Reply-To: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Tue, 11 Feb 2025 07:59:01 GMT, Dean Long wrote: > When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. > > In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. > > Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. Hi @dean-long, I got build failure on s390: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/amit/jdk/src/hotspot/cpu/s390/abstractInterpreter_s390.cpp:190), pid=3885713, tid=3885721 # assert(l2 >= locals_base) failed: bad placement # # JRE version: OpenJDK Runtime Environment (25.0) (fastdebug build 25-internal-adhoc.amit.jdk) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 25-internal-adhoc.amit.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-s390x) # Problematic frame: # V [libjvm.so+0x22c07a] AbstractInterpreter::layout_activation(Method*, int, int, int, int, int, int, frame*, frame*, bool, bool)+0x4b2 # ------------- PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2652591361 From iklam at openjdk.org Wed Feb 12 04:27:02 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 12 Feb 2025 04:27:02 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file Message-ID: Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): > the format of the configuration and cache files is not specified and is subject to change without notice. **Notes for reviewers:** - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. ------------- Commit messages: - Added comments; fixed FIXMEs - Added more test cases - Clean up; improved error messages - 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file Changes: https://git.openjdk.org/jdk/pull/23484/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23484&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8348426 Stats: 1172 lines in 38 files changed: 968 ins; 47 del; 157 mod Patch: https://git.openjdk.org/jdk/pull/23484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23484/head:pull/23484 PR: https://git.openjdk.org/jdk/pull/23484 From dholmes at openjdk.org Wed Feb 12 06:29:09 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 12 Feb 2025 06:29:09 GMT Subject: RFR: 8349083: Factor out filename handling code from logging In-Reply-To: References: Message-ID: <82r1a8p4tVWdY0x7tkL1qCyFcMh2anmbZmWwfvPi1B4=.7e782b73-bcd3-4012-8fdb-f55bed69b2cb@github.com> On Sat, 1 Feb 2025 16:53:13 GMT, Zhengyu Gu wrote: > Factor out filename substitution code from unified logging, so that it can be used elsewhere: > > 1. Make filename substitution consistent. Support following substitutions cross JVM > ``` > %p -> pid > %t -> timestamp > %hn -> hostname > > > 2. Reduce redundant code I'm not sure how best to handle the "%t". My view was that we would standardize "%t" to mean the timestamp of when the VM started, but I guess if there are pre-existing uses of "%t" that mean something different then that would be a problem. Note that share/utilities/ostream.cpp also has similar logic for creating log file names with %p and %t. My concern is your current code re-stringifies the timestamp on every call, whereas the logging code only created the vm_startup_time_str once. When the timestamp is being passed in, memoizing it is problematic - whereas if you know it is the VM startup time it is no problem at all. The strings for %p and %hn should also be memoized, though logging does not currently do that. It might be clearer to take a timestamp-string rather than the timestamp and then have utility methods to get `vm_startup_time_str()` and `current_time_str()`, where the former at least can cache the string for future use. The same could be done for `host_name_str()` and `pid_str()`. The caching for idempotent strings doesn't need to use locking, just a `release_store` with a paired `load_acquire`. ------------- PR Review: https://git.openjdk.org/jdk/pull/23410#pullrequestreview-2610936532 From dholmes at openjdk.org Wed Feb 12 06:32:25 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 12 Feb 2025 06:32:25 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v7] In-Reply-To: References: Message-ID: On Mon, 10 Feb 2025 13:23:49 GMT, Coleen Phillimore wrote: >> This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into protection-domain > - Move test for protectionDomain filtering. > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Remove @Stable annotation for final field. > - Fix test that knows which fields are hidden from reflection in jvmci. > - Hide Class.protectionDomain for reflection and add a test case. > - Merge branch 'master' into protection-domain > - Fix two tests. > - Fix the test. > - ... and 1 more: https://git.openjdk.org/jdk/compare/c9cadbd2...2208302c Hotspot changes are fine. This all looks good to me. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23396#pullrequestreview-2610940836 From rehn at openjdk.org Wed Feb 12 06:44:50 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 12 Feb 2025 06:44:50 GMT Subject: RFR: 8349851: RISCV: Call VM leaf can use movptr2 [v2] In-Reply-To: References: Message-ID: <9Fx0BjMvNHBiCuxeAph3FTkm0bCftDdgN4QvIImHPY0=.ff7f1e59-c251-4629-91c3-d26e19324bd6@github.com> > Hi, please consider. > > There should be a small speed up to vm leafs. > We can scratch t2 here as we just pushed it. (maybe I should have used t0) > > Passes t1 > > /Robbin Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: t0, remove ws ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23565/files - new: https://git.openjdk.org/jdk/pull/23565/files/3fde822a..66a25f3c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23565&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23565&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23565/head:pull/23565 PR: https://git.openjdk.org/jdk/pull/23565 From fyang at openjdk.org Wed Feb 12 06:56:11 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 12 Feb 2025 06:56:11 GMT Subject: RFR: 8349851: RISCV: Call VM leaf can use movptr2 [v2] In-Reply-To: <9Fx0BjMvNHBiCuxeAph3FTkm0bCftDdgN4QvIImHPY0=.ff7f1e59-c251-4629-91c3-d26e19324bd6@github.com> References: <9Fx0BjMvNHBiCuxeAph3FTkm0bCftDdgN4QvIImHPY0=.ff7f1e59-c251-4629-91c3-d26e19324bd6@github.com> Message-ID: On Wed, 12 Feb 2025 06:44:50 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> There should be a small speed up to vm leafs. >> We can scratch t2 here as we just pushed it. (maybe I should have used t0) >> >> Passes t1 >> >> /Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > t0, remove ws Thanks for the update. Just wonder how many instructions will we save here. I know it will depend on the value of `entry_point`. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23565#pullrequestreview-2610975384 From fyang at openjdk.org Wed Feb 12 06:59:14 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 12 Feb 2025 06:59:14 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash In-Reply-To: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Tue, 11 Feb 2025 07:59:01 GMT, Dean Long wrote: > When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. > > In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. > > Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. FYI: `test/hotspot/jtreg/compiler/jsr292/MHDeoptTest.java` and hs-tier1 test good on linux-riscv64 with fastdebug build. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2652818321 From alanb at openjdk.org Wed Feb 12 07:36:13 2025 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 12 Feb 2025 07:36:13 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v7] In-Reply-To: References: Message-ID: On Mon, 10 Feb 2025 13:23:49 GMT, Coleen Phillimore wrote: >> This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into protection-domain > - Move test for protectionDomain filtering. > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Remove @Stable annotation for final field. > - Fix test that knows which fields are hidden from reflection in jvmci. > - Hide Class.protectionDomain for reflection and add a test case. > - Merge branch 'master' into protection-domain > - Fix two tests. > - Fix the test. > - ... and 1 more: https://git.openjdk.org/jdk/compare/c9cadbd2...2208302c This looks okay. There will be some follow-up cleanup needed in the libs code, e.g.JLA.protectionDomain(Class) can go away, something for future PRs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23396#issuecomment-2652874791 From rehn at openjdk.org Wed Feb 12 07:39:08 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 12 Feb 2025 07:39:08 GMT Subject: RFR: 8349851: RISCV: Call VM leaf can use movptr2 [v2] In-Reply-To: References: <9Fx0BjMvNHBiCuxeAph3FTkm0bCftDdgN4QvIImHPY0=.ff7f1e59-c251-4629-91c3-d26e19324bd6@github.com> Message-ID: On Wed, 12 Feb 2025 06:54:01 GMT, Fei Yang wrote: > Thanks for the update. Just wonder how many instructions will we save here. I know it will depend on the value of `entry_point`. I get two versions(I modified these to use t2 so i can easily find them): ################################################### 0x000074dbeb862742: lui t2,0x1d3 0x000074dbeb862746: addiw t2,t2,1795 # 0x00000000001d3703 0x000074dbeb86274a: c.slli t2,0xd 0x000074dbeb86274c: addi t2,t2,-1155 0x000074dbeb862750: c.slli t2,0xd 0x000074dbeb862604: lui t2,0x74dc 0x000074dbeb862608: addiw t2,t2,177 # 0x00000000074dc0b1 0x000074dbeb86260c: c.slli t2,0x14 After: ################################################### 0x00007a0ee4226e0a: lui t0,0x1e83c 0x00007a0ee4226e0e: lui t2,0x111e0 0x00007a0ee4226e12: c.slli t0,0x12 0x00007a0ee4226e14: c.add t2,t0 For an out-of-order machine the two lui can be done in parallel, so movptr2 can be constant of e.g. 3 cycles. lui | lui slli add For in-order machine some cases may require one additional instruction. I guess we could create a li48_2 which or something like that. But to be honest I really hope we are not targeting in-order machines here :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23565#issuecomment-2652878689 From fyang at openjdk.org Wed Feb 12 07:43:13 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 12 Feb 2025 07:43:13 GMT Subject: RFR: 8349851: RISCV: Call VM leaf can use movptr2 [v2] In-Reply-To: References: <9Fx0BjMvNHBiCuxeAph3FTkm0bCftDdgN4QvIImHPY0=.ff7f1e59-c251-4629-91c3-d26e19324bd6@github.com> Message-ID: On Wed, 12 Feb 2025 07:35:41 GMT, Robbin Ehn wrote: > For in-order machine some cases may require one additional instruction. I guess we could create a li48_2 which or something like that. But to be honest I really hope we are not targeting in-order machines here :) Yeah, I agree. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23565#issuecomment-2652887395 From iklam at openjdk.org Wed Feb 12 08:13:02 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 12 Feb 2025 08:13:02 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v2] In-Reply-To: References: Message-ID: > Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. > > With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. > > To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): > >> the format of the configuration and cache files is not specified and is subject to change without notice. > > **Notes for reviewers:** > > - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. > - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. > - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. > - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. > - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: - Update "make test JTREG_AOT_JDK=true ..." to work with binary AOT configuration - Fixed test failures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23484/files - new: https://git.openjdk.org/jdk/pull/23484/files/daa33c5e..0e77a35c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23484&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23484&range=00-01 Stats: 51 lines in 5 files changed: 27 ins; 17 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23484/head:pull/23484 PR: https://git.openjdk.org/jdk/pull/23484 From jbhateja at openjdk.org Wed Feb 12 09:13:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 12 Feb 2025 09:13:17 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v17] In-Reply-To: References: Message-ID: <1xQeG8IO8aJNUluyWTaz9cm2xmTKSNsZJMNhnicnm5s=.304de8b6-9bba-44db-9982-eddaf950a415@github.com> On Mon, 10 Feb 2025 21:23:28 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixing typos > > An impressive and substantial change. I focused on the Java code, there are some small tweaks, presented in comments, we can make to the intrinsics to improve the expression of code, and it has no impact on the intrinsic implementation. Hi @PaulSandoz , Your comments have been addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22754#issuecomment-2653071755 From mli at openjdk.org Wed Feb 12 10:23:10 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 12 Feb 2025 10:23:10 GMT Subject: RFR: 8349851: RISCV: Call VM leaf can use movptr2 [v2] In-Reply-To: <9Fx0BjMvNHBiCuxeAph3FTkm0bCftDdgN4QvIImHPY0=.ff7f1e59-c251-4629-91c3-d26e19324bd6@github.com> References: <9Fx0BjMvNHBiCuxeAph3FTkm0bCftDdgN4QvIImHPY0=.ff7f1e59-c251-4629-91c3-d26e19324bd6@github.com> Message-ID: On Wed, 12 Feb 2025 06:44:50 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> There should be a small speed up to vm leafs. >> We can scratch t2 here as we just pushed it. (maybe I should have used t0) >> >> Passes t1 >> >> /Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > t0, remove ws Looks good. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23565#pullrequestreview-2611453300 From mdoerr at openjdk.org Wed Feb 12 10:55:16 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 12 Feb 2025 10:55:16 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v22] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: <5ax8CsW02SVuyN4NpdaPxb4nS--8R1myFvAI71IuU8M=.896d9de7-2a74-4786-9954-db2bf27847dd@github.com> On Tue, 11 Feb 2025 07:15:27 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: > > - common code function > - common code function src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 571: > 569: masm->vxor(vTmp10, vTmp10, vTmp6); // Combine reduced Low & High products > 570: masm->vxor(vState, vTmp4, vTmp10); > 571: masm->addi(data, data, 16); I think incrementing the data pointer fits better into the loop instead of this helper function. Don't forget that hotspot uses 2 spaces indentation! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1952418392 From yzheng at openjdk.org Wed Feb 12 12:08:21 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 12 Feb 2025 12:08:21 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v7] In-Reply-To: References: Message-ID: <1OD68qMw2krDXRaD2SYsuqSYX5tnnRTaXHpGZfxjmz8=.aece15d8-56cc-41cb-9f75-67669a8e0bba@github.com> On Mon, 10 Feb 2025 13:23:49 GMT, Coleen Phillimore wrote: >> This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into protection-domain > - Move test for protectionDomain filtering. > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Remove @Stable annotation for final field. > - Fix test that knows which fields are hidden from reflection in jvmci. > - Hide Class.protectionDomain for reflection and add a test case. > - Merge branch 'master' into protection-domain > - Fix two tests. > - Fix the test. > - ... and 1 more: https://git.openjdk.org/jdk/compare/c9cadbd2...2208302c LGTM ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/23396#pullrequestreview-2611703401 From coleenp at openjdk.org Wed Feb 12 12:08:22 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 12 Feb 2025 12:08:22 GMT Subject: Integrated: 8349145: Make Class.getProtectionDomain() non-native In-Reply-To: References: Message-ID: On Fri, 31 Jan 2025 16:39:35 GMT, Coleen Phillimore wrote: > This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. > Tested with tier1-4. This pull request has now been integrated. Changeset: ed17c55e Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/ed17c55ea34b3b6009dab11d64f21e0b7af3d701 Stats: 65 lines in 13 files changed: 15 ins; 34 del; 16 mod 8349145: Make Class.getProtectionDomain() non-native Reviewed-by: liach, dholmes, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/23396 From coleenp at openjdk.org Wed Feb 12 12:08:21 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 12 Feb 2025 12:08:21 GMT Subject: RFR: 8349145: Make Class.getProtectionDomain() non-native [v7] In-Reply-To: References: Message-ID: <_uTUsNtEQuGM5gEJm51_U8_mPyBePpr3hIzfktY_6SA=.bf9b162d-fa01-4f6d-aaa1-fc54b1c8797c@github.com> On Mon, 10 Feb 2025 13:23:49 GMT, Coleen Phillimore wrote: >> This change removes the native call and injected field for ProtectionDomain in the java.lang.Class instance, and moves the field to be declared in Java. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into protection-domain > - Move test for protectionDomain filtering. > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Update test/jdk/java/lang/reflect/AccessibleObject/TrySetAccessibleTest.java > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Remove @Stable annotation for final field. > - Fix test that knows which fields are hidden from reflection in jvmci. > - Hide Class.protectionDomain for reflection and add a test case. > - Merge branch 'master' into protection-domain > - Fix two tests. > - Fix the test. > - ... and 1 more: https://git.openjdk.org/jdk/compare/c9cadbd2...2208302c Thanks for reviewing Chen, David, Alan and Yudi. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23396#issuecomment-2653516543 From rrich at openjdk.org Wed Feb 12 12:38:10 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 12 Feb 2025 12:38:10 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash In-Reply-To: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Tue, 11 Feb 2025 07:59:01 GMT, Dean Long wrote: > When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. > > In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. > > Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. src/hotspot/cpu/ppc/abstractInterpreter_ppc.cpp line 136: > 134: // Test caller-aligned placement vs callee-aligned > 135: intptr_t* l2 = caller->sp() + method->max_locals() - 1 + (frame::java_abi_size / Interpreter::stackElementSize); > 136: assert(l2 >= locals_base, "bad placement"); The assertion at L136 fails on ppc64 (similar to what @offamitkumar reported for s390x). I don't understand the assertion because it is just a stricter version of the fist one. On ppc64 the sp of `caller` is aligned down because it needs to be 16 byte aligned. `locals_base` is only 8 byte aligned. But from what I saw the difference was larger then just one word. Maybe `caller` has got an c2i extension? I guess this would be problematic. On x86_64 `l2` depends on the last expression stack pointer not on the `caller`'s sp. If you try to translate this to ppc64 then you'll get the expression used to initialize `locals_base` at L128. I think you can remove the 2nd assertion. Even the first one looks redundant. Besides that I've tested `MHDeoptTest.java` successfully on ppc64. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1952565563 From rrich at openjdk.org Wed Feb 12 12:47:10 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 12 Feb 2025 12:47:10 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: <00HHPN1Q9xrNf8Ps_9S7hOOHHmw2mNocFrQzqxzYhRA=.bb2f9c11-12c5-4efa-8314-4415e22e31f8@github.com> On Wed, 12 Feb 2025 12:35:07 GMT, Richard Reingruber wrote: > Maybe `caller` has got an c2i extension? I guess this would be problematic. I meant i2c extension. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1952578551 From jsjolen at openjdk.org Wed Feb 12 13:11:11 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 12 Feb 2025 13:11:11 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v7] In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 16:30:51 GMT, Casper Norrbin wrote: >> src/hotspot/share/utilities/rbTree.hpp line 166: >> >>> 164: bool found() const { return *_insert_location != nullptr; } >>> 165: RBNode* node() { return _insert_location == nullptr ? nullptr : *_insert_location; } >>> 166: RBNode* node() const { return _insert_location == nullptr ? nullptr : *_insert_location; } >> >> Is there any case where I don't need to check the validity of the cursor? That is, do I ever want to use `node()` without first calling `valid()` or afterwards checking whether the returned value was null? >> >> If the answer to that is: "No, there is no such case", then we shouldn't return null on `!valid()` node. We should instead add `assert(valid(), "must be");". If the answer is yes, then could you please tell me what that situation is :P? > > It's been useful in a couple of places, where we want "the node or nullptr otherwise", since you get the valid check and the node in one. A few examples where we don't check validity: > In `visit_range_in_order`, we iterate until the node (or nullptr) is reached. > In `upsert`, we extract the node node into a local variable and either modify the node or reuse the variable. > In `closest_gt`, we return the next node from the cursor, which could be invalid. OK, then leave it as is :-). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1952614271 From jsjolen at openjdk.org Wed Feb 12 13:30:12 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 12 Feb 2025 13:30:12 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v8] In-Reply-To: References: Message-ID: <-p1L-GNjYcYK3OfhzYf425jWUyGzUTcOpgSV3eE_9DM=.c7463ab8-52ee-41d8-86db-d2ff5b8b2e4f@github.com> On Tue, 11 Feb 2025 16:15:02 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. >> >> Two key changes enable this feature: >> 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. >> 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. >> >> >> Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. >> >> An example of how you could use the intrusive tree is found below: >> >> ```c++ >> struct MyIntrusiveStructure { >> Node node; // The tree node is part of an external structure >> int data; >> >> MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} >> Node* get_node() { return &node; } >> static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } >> }; >> >> Tree my_intrusive_tree; >> >> Cursor insert_cursor = my_intrusive_tree.cursor_find(0); >> Node insert_node = Node(0); >> >> // Custom allocation here is just malloc >> MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); >> new (place) MyIntrusiveStructure(0, insert_node); >> >> my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); >> >> Cursor find_cursor = my_intrusive_tree.cursor_find(0); >> int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; >> >> >> >> Please let me know any feedback or concerns! > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > johan feedback Alright, I'm okay with this. As all the internals now use cursors we basically get the intrusive-style API tested for free. Thanks for your efforts on this! ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23416#pullrequestreview-2611922169 From cnorrbin at openjdk.org Wed Feb 12 13:39:32 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Wed, 12 Feb 2025 13:39:32 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v9] In-Reply-To: References: Message-ID: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: renamed non-value upsert to insert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23416/files - new: https://git.openjdk.org/jdk/pull/23416/files/48241078..a620c9bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=07-08 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From jsjolen at openjdk.org Wed Feb 12 13:45:12 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 12 Feb 2025 13:45:12 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v9] In-Reply-To: References: Message-ID: <8RALYzRApfGXL3WG9Lrlr5d60CO_Qs12r-eljUFRqQg=.3574e411-c403-406d-8008-7fb9dbbcbd4a@github.com> On Wed, 12 Feb 2025 13:39:32 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. >> >> Two key changes enable this feature: >> 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. >> 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. >> >> >> Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. >> >> An example of how you could use the intrusive tree is found below: >> >> ```c++ >> struct MyIntrusiveStructure { >> Node node; // The tree node is part of an external structure >> int data; >> >> MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} >> Node* get_node() { return &node; } >> static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } >> }; >> >> Tree my_intrusive_tree; >> >> Cursor insert_cursor = my_intrusive_tree.cursor_find(0); >> Node insert_node = Node(0); >> >> // Custom allocation here is just malloc >> MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); >> new (place) MyIntrusiveStructure(0, insert_node); >> >> my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); >> >> Cursor find_cursor = my_intrusive_tree.cursor_find(0); >> int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; >> >> >> >> Please let me know any feedback or concerns! > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > renamed non-value upsert to insert Marked as reviewed by jsjolen (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23416#pullrequestreview-2611961557 From psandoz at openjdk.org Wed Feb 12 14:49:27 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 12 Feb 2025 14:49:27 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v18] In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 06:32:56 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 9. X86 backend implementation for all supported intrinsics. >> 10. Functional and Performance validation tests. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions Looks good. I merged this PR with master, successfully (at the time) with no conflicts, and ran it through tier 1 to 3 testing and there were no failures. ------------- Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22754#pullrequestreview-2612181239 From duke at openjdk.org Wed Feb 12 15:27:13 2025 From: duke at openjdk.org (duke) Date: Wed, 12 Feb 2025 15:27:13 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v6] In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 21:47:31 GMT, Volodymyr Paprotski wrote: >> (Also see `8319429: Resetting MXCSR flags degrades ecore`) >> >> This PR fixes two issues: >> - the original issue is a crash caused by `__ warn` corrupting the stack on Windows only >> - This issue also uncovered that -Xcheck:jni test cases were getting 65k lines of warning on HelloWorld (on both Linux _and_ windows): >> >> OpenJDK 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall >> >> >> First, the crash. Caused when FXRSTOR is attempting to write reserved bits into MXCSR. If those bits happen to be set, crash. (Hence the crash isn't deterministic. But frequent enough if `__ warn` is used). It is caused by the binding not reserving stack space for register parameters () >> ![image](https://github.com/user-attachments/assets/4ad63908-088b-4e9d-9e7d-a3509bee046a) >> Prolog of the warn function then proceeds to store the for arg registers onto the stack, overriding the fxstore save area. (See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#calling-convention-defaults) >> >> Fix uses `frame::arg_reg_save_area_bytes` to bump the stack pointer. >> >> --- >> >> I also kept the fix to `verify_mxcsr` since without it, `-Xcheck:jni` is practically unusable when `-XX:+EnableX86ECoreOpts` are set (65k+ lines of warnings) > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comments from Sandhya @vpaprotsk Your change (at version b23764ab56c3729598b52bdb660e43e342f9286b) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22673#issuecomment-2654048613 From kvn at openjdk.org Wed Feb 12 15:31:09 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 12 Feb 2025 15:31:09 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v2] In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 08:13:02 GMT, Ioi Lam wrote: >> Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. >> >> With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. >> >> To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): >> >>> the format of the configuration and cache files is not specified and is subject to change without notice. >> >> **Notes for reviewers:** >> >> - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. >> - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. >> - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. >> - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. >> - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. > > Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: > > - Update "make test JTREG_AOT_JDK=true ..." to work with binary AOT configuration > - Fixed test failures tools/javac/ImplicitClass/ImplicitImports.java failed in GHA: [0.002s][warning][cds] Unable to use AOT cache: CDS is disabled when java.base module is patched. Hello, World! Exception running test testImplicitSimpleIOImport: java.lang.AssertionError: Incorrect Output, expected: [Hello, World!], actual: [[0.002s][warning][cds] Unable to use AOT cache: CDS is disabled when java.base module is patched., Hello, World!] java.lang.AssertionError: Incorrect Output, expected: [Hello, World!], actual: [[0.002s][warning][cds] Unable to use AOT cache: CDS is disabled when java.base module is patched., Hello, World!] at ImplicitImports.testImplicitSimpleIOImport(ImplicitImports.java:171) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23484#issuecomment-2654059225 From vpaprotski at openjdk.org Wed Feb 12 15:47:04 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 12 Feb 2025 15:47:04 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v7] In-Reply-To: References: Message-ID: > (Also see `8319429: Resetting MXCSR flags degrades ecore`) > > This PR fixes two issues: > - the original issue is a crash caused by `__ warn` corrupting the stack on Windows only > - This issue also uncovered that -Xcheck:jni test cases were getting 65k lines of warning on HelloWorld (on both Linux _and_ windows): > > OpenJDK 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall > > > First, the crash. Caused when FXRSTOR is attempting to write reserved bits into MXCSR. If those bits happen to be set, crash. (Hence the crash isn't deterministic. But frequent enough if `__ warn` is used). It is caused by the binding not reserving stack space for register parameters () > ![image](https://github.com/user-attachments/assets/4ad63908-088b-4e9d-9e7d-a3509bee046a) > Prolog of the warn function then proceeds to store the for arg registers onto the stack, overriding the fxstore save area. (See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#calling-convention-defaults) > > Fix uses `frame::arg_reg_save_area_bytes` to bump the stack pointer. > > --- > > I also kept the fix to `verify_mxcsr` since without it, `-Xcheck:jni` is practically unusable when `-XX:+EnableX86ECoreOpts` are set (65k+ lines of warnings) Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/x86/macroAssembler_x86.cpp Co-authored-by: Julian Waters <32636402+TheShermanTanker at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22673/files - new: https://git.openjdk.org/jdk/pull/22673/files/b23764ab..cbd3812d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22673&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22673&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22673.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22673/head:pull/22673 PR: https://git.openjdk.org/jdk/pull/22673 From jwaters at openjdk.org Wed Feb 12 15:47:06 2025 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 12 Feb 2025 15:47:06 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v6] In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 21:47:31 GMT, Volodymyr Paprotski wrote: >> (Also see `8319429: Resetting MXCSR flags degrades ecore`) >> >> This PR fixes two issues: >> - the original issue is a crash caused by `__ warn` corrupting the stack on Windows only >> - This issue also uncovered that -Xcheck:jni test cases were getting 65k lines of warning on HelloWorld (on both Linux _and_ windows): >> >> OpenJDK 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall >> >> >> First, the crash. Caused when FXRSTOR is attempting to write reserved bits into MXCSR. If those bits happen to be set, crash. (Hence the crash isn't deterministic. But frequent enough if `__ warn` is used). It is caused by the binding not reserving stack space for register parameters () >> ![image](https://github.com/user-attachments/assets/4ad63908-088b-4e9d-9e7d-a3509bee046a) >> Prolog of the warn function then proceeds to store the for arg registers onto the stack, overriding the fxstore save area. (See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#calling-convention-defaults) >> >> Fix uses `frame::arg_reg_save_area_bytes` to bump the stack pointer. >> >> --- >> >> I also kept the fix to `verify_mxcsr` since without it, `-Xcheck:jni` is practically unusable when `-XX:+EnableX86ECoreOpts` are set (65k+ lines of warnings) > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comments from Sandhya Looks good otherwise src/hotspot/cpu/x86/macroAssembler_x86.cpp line 783: > 781: > 782: #ifdef _WIN64 > 783: // Windows always allocates space for it's register args Suggestion: // Windows always allocates space for its register args src/hotspot/os/windows/os_windows.cpp line 2757: > 2755: > 2756: #if defined(_M_AMD64) > 2757: extern bool handle_FLT_exception(struct _EXCEPTION_POINTERS* exceptionInfo); This seems strange to declare inside another method like this, not a showstopper but it might be better to declare it where handle_FLT_exception used to be defined in this file ------------- Marked as reviewed by jwaters (Committer). PR Review: https://git.openjdk.org/jdk/pull/22673#pullrequestreview-2612367075 PR Review Comment: https://git.openjdk.org/jdk/pull/22673#discussion_r1952906388 PR Review Comment: https://git.openjdk.org/jdk/pull/22673#discussion_r1952902264 From vpaprotski at openjdk.org Wed Feb 12 15:47:07 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 12 Feb 2025 15:47:07 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v6] In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 15:38:34 GMT, Julian Waters wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> comments from Sandhya > > src/hotspot/os/windows/os_windows.cpp line 2757: > >> 2755: >> 2756: #if defined(_M_AMD64) >> 2757: extern bool handle_FLT_exception(struct _EXCEPTION_POINTERS* exceptionInfo); > > This seems strange to declare inside another method like this, not a showstopper but it might be better to declare it where handle_FLT_exception used to be defined in this file Oh.. I remember wondering where to put this. Meant to come back and find the right header ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22673#discussion_r1952907016 From zgu at openjdk.org Wed Feb 12 15:54:11 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 12 Feb 2025 15:54:11 GMT Subject: RFR: 8349083: Factor out filename handling code from logging In-Reply-To: <82r1a8p4tVWdY0x7tkL1qCyFcMh2anmbZmWwfvPi1B4=.7e782b73-bcd3-4012-8fdb-f55bed69b2cb@github.com> References: <82r1a8p4tVWdY0x7tkL1qCyFcMh2anmbZmWwfvPi1B4=.7e782b73-bcd3-4012-8fdb-f55bed69b2cb@github.com> Message-ID: On Wed, 12 Feb 2025 06:26:25 GMT, David Holmes wrote: > I'm not sure how best to handle the "%t". My view was that we would standardize "%t" to mean the timestamp of when the VM started, but I guess if there are pre-existing uses of "%t" that mean something different then that would be a problem. Note that share/utilities/ostream.cpp also has similar logic for creating log file names with %p and %t. > > My concern is your current code re-stringifies the timestamp on every call, whereas the logging code only created the vm_startup_time_str once. When the timestamp is being passed in, memoizing it is problematic - whereas if you know it is the VM startup time it is no problem at all. The strings for %p and %hn should also be memoized, though logging does not currently do that. > > It might be clearer to take a timestamp-string rather than the timestamp and then have utility methods to get `vm_startup_time_str()` and `current_time_str()`, where the former at least can cache the string for future use. The same could be done for `host_name_str()` and `pid_str()`. The caching for idempotent strings doesn't need to use locking, just a `release_store` with a paired `load_acquire`. Hi @dholmes-ora I am *not* trying to standardize timestamp in this PR, but a simple refactor so that we don't have wildcard parsing and replacing code all over the places. I am curious why re-stringifying the timestamp and host name are concerns? obviously, the code is not on any hot paths. If we decide to standardize timestamp, then we should hoist initializing `vm_startup_time` code up from unified logging and stringify timestamp and host name cache here, just as unified logging does right now early in bootstrap cycle (`Threads::create_vm()`), therefore, we don't need any memory barriers to use the cached strings. Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/23410#issuecomment-2654128259 From kvn at openjdk.org Wed Feb 12 16:07:13 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 12 Feb 2025 16:07:13 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v7] In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 15:47:04 GMT, Volodymyr Paprotski wrote: >> (Also see `8319429: Resetting MXCSR flags degrades ecore`) >> >> This PR fixes two issues: >> - the original issue is a crash caused by `__ warn` corrupting the stack on Windows only >> - This issue also uncovered that -Xcheck:jni test cases were getting 65k lines of warning on HelloWorld (on both Linux _and_ windows): >> >> OpenJDK 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall >> >> >> First, the crash. Caused when FXRSTOR is attempting to write reserved bits into MXCSR. If those bits happen to be set, crash. (Hence the crash isn't deterministic. But frequent enough if `__ warn` is used). It is caused by the binding not reserving stack space for register parameters () >> ![image](https://github.com/user-attachments/assets/4ad63908-088b-4e9d-9e7d-a3509bee046a) >> Prolog of the warn function then proceeds to store the for arg registers onto the stack, overriding the fxstore save area. (See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#calling-convention-defaults) >> >> Fix uses `frame::arg_reg_save_area_bytes` to bump the stack pointer. >> >> --- >> >> I also kept the fix to `verify_mxcsr` since without it, `-Xcheck:jni` is practically unusable when `-XX:+EnableX86ECoreOpts` are set (65k+ lines of warnings) > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/x86/macroAssembler_x86.cpp > > Co-authored-by: Julian Waters <32636402+TheShermanTanker at users.noreply.github.com> I submitted our internal testing. Please wait results. ------------- PR Review: https://git.openjdk.org/jdk/pull/22673#pullrequestreview-2612451477 From vpaprotski at openjdk.org Wed Feb 12 16:26:18 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 12 Feb 2025 16:26:18 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v6] In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 21:47:31 GMT, Volodymyr Paprotski wrote: >> (Also see `8319429: Resetting MXCSR flags degrades ecore`) >> >> This PR fixes two issues: >> - the original issue is a crash caused by `__ warn` corrupting the stack on Windows only >> - This issue also uncovered that -Xcheck:jni test cases were getting 65k lines of warning on HelloWorld (on both Linux _and_ windows): >> >> OpenJDK 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall >> >> >> First, the crash. Caused when FXRSTOR is attempting to write reserved bits into MXCSR. If those bits happen to be set, crash. (Hence the crash isn't deterministic. But frequent enough if `__ warn` is used). It is caused by the binding not reserving stack space for register parameters () >> ![image](https://github.com/user-attachments/assets/4ad63908-088b-4e9d-9e7d-a3509bee046a) >> Prolog of the warn function then proceeds to store the for arg registers onto the stack, overriding the fxstore save area. (See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#calling-convention-defaults) >> >> Fix uses `frame::arg_reg_save_area_bytes` to bump the stack pointer. >> >> --- >> >> I also kept the fix to `verify_mxcsr` since without it, `-Xcheck:jni` is practically unusable when `-XX:+EnableX86ECoreOpts` are set (65k+ lines of warnings) > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comments from Sandhya (deleted integrate command) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22673#issuecomment-2654045391 From vpaprotski at openjdk.org Wed Feb 12 16:26:19 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 12 Feb 2025 16:26:19 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v7] In-Reply-To: References: Message-ID: <4xs7dpEQUvQu7Qpp84scCQXABz2FvtYCn9HpDzwNxcM=.af7bfa29-bcb5-4f36-9811-e5952f80b9f7@github.com> On Wed, 12 Feb 2025 16:05:03 GMT, Vladimir Kozlov wrote: > I submitted our internal testing. Please wait results. Thanks! Deleted the integrate command ------------- PR Comment: https://git.openjdk.org/jdk/pull/22673#issuecomment-2654227936 From jwaters at openjdk.org Wed Feb 12 16:35:18 2025 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 12 Feb 2025 16:35:18 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v7] In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 15:47:04 GMT, Volodymyr Paprotski wrote: >> (Also see `8319429: Resetting MXCSR flags degrades ecore`) >> >> This PR fixes two issues: >> - the original issue is a crash caused by `__ warn` corrupting the stack on Windows only >> - This issue also uncovered that -Xcheck:jni test cases were getting 65k lines of warning on HelloWorld (on both Linux _and_ windows): >> >> OpenJDK 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall >> >> >> First, the crash. Caused when FXRSTOR is attempting to write reserved bits into MXCSR. If those bits happen to be set, crash. (Hence the crash isn't deterministic. But frequent enough if `__ warn` is used). It is caused by the binding not reserving stack space for register parameters () >> ![image](https://github.com/user-attachments/assets/4ad63908-088b-4e9d-9e7d-a3509bee046a) >> Prolog of the warn function then proceeds to store the for arg registers onto the stack, overriding the fxstore save area. (See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#calling-convention-defaults) >> >> Fix uses `frame::arg_reg_save_area_bytes` to bump the stack pointer. >> >> --- >> >> I also kept the fix to `verify_mxcsr` since without it, `-Xcheck:jni` is practically unusable when `-XX:+EnableX86ECoreOpts` are set (65k+ lines of warnings) > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/x86/macroAssembler_x86.cpp > > Co-authored-by: Julian Waters <32636402+TheShermanTanker at users.noreply.github.com> Marked as reviewed by jwaters (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22673#pullrequestreview-2612551969 From jbhateja at openjdk.org Wed Feb 12 17:08:25 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 12 Feb 2025 17:08:25 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v18] In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 14:46:49 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > Looks good. I merged this PR with master, successfully (at the time) with no conflicts, and ran it through tier 1 to 3 testing and there were no failures. Thanks @PaulSandoz , @eme64 and @sviswa7 for your valuable feedback. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22754#issuecomment-2654337191 From jbhateja at openjdk.org Wed Feb 12 17:08:28 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 12 Feb 2025 17:08:28 GMT Subject: Integrated: 8342103: C2 compiler support for Float16 type and associated scalar operations In-Reply-To: References: Message-ID: <0jFE4E2Aewb7aCN5nZrmV3Lz3SSsNSmhhUEiL9JQjMA=.c202afcf-340c-4fca-8a2a-778c7677fe1f@github.com> On Sun, 15 Dec 2024 18:05:02 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 9. X86 backend implementation for all supported intrinsics. > 10. Functional and Performance validation tests. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 4b463ee7 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/4b463ee70eceb94fdfbffa5c49dd58dcc6a6c890 Stats: 2855 lines in 56 files changed: 2788 ins; 0 del; 67 mod 8342103: C2 compiler support for Float16 type and associated scalar operations Co-authored-by: Paul Sandoz Co-authored-by: Bhavana Kilambi Co-authored-by: Joe Darcy Co-authored-by: Raffaello Giulietti Reviewed-by: psandoz, epeter, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/22754 From iklam at openjdk.org Wed Feb 12 17:51:38 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 12 Feb 2025 17:51:38 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v3] In-Reply-To: References: Message-ID: > Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. > > With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. > > To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): > >> the format of the configuration and cache files is not specified and is subject to change without notice. > > **Notes for reviewers:** > > - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. > - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. > - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. > - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. > - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Fixed test cases @vnkozlov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23484/files - new: https://git.openjdk.org/jdk/pull/23484/files/0e77a35c..74f5e29d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23484&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23484&range=01-02 Stats: 34 lines in 6 files changed: 14 ins; 8 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/23484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23484/head:pull/23484 PR: https://git.openjdk.org/jdk/pull/23484 From iklam at openjdk.org Wed Feb 12 17:57:09 2025 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 12 Feb 2025 17:57:09 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v2] In-Reply-To: References: Message-ID: <-MnkZJWH0e0Bq-8a7Vot1J_om8b0UcScMxWiqpX5a8o=.6798d342-e980-4405-bef2-e6d0a6869508@github.com> On Wed, 12 Feb 2025 15:28:30 GMT, Vladimir Kozlov wrote: > tools/javac/ImplicitClass/ImplicitImports.java failed in GHA: > > ``` > [0.002s][warning][cds] Unable to use AOT cache: CDS is disabled when java.base module is patched. > Hello, World! > Exception running test testImplicitSimpleIOImport: java.lang.AssertionError: Incorrect Output, expected: [Hello, World!], actual: [[0.002s][warning][cds] Unable to use AOT cache: CDS is disabled when java.base module is patched., Hello, World!] > java.lang.AssertionError: Incorrect Output, expected: [Hello, World!], actual: [[0.002s][warning][cds] Unable to use AOT cache: CDS is disabled when java.base module is patched., Hello, World!] > at ImplicitImports.testImplicitSimpleIOImport(ImplicitImports.java:171) > ``` I fixed the failure. I'll rerun all tests tiers 1-5. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23484#issuecomment-2654458729 From gziemski at openjdk.org Wed Feb 12 18:20:06 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 12 Feb 2025 18:20:06 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v30] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: fix linux build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/3a1c3b20..1e240a5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=28-29 Stats: 11 lines in 2 files changed: 4 ins; 4 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From gziemski at openjdk.org Wed Feb 12 19:23:38 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 12 Feb 2025 19:23:38 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v31] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: fix Win, Linux builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/1e240a5d..1fa46d61 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=29-30 Stats: 10 lines in 3 files changed: 2 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From gziemski at openjdk.org Wed Feb 12 19:27:38 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 12 Feb 2025 19:27:38 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v32] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: white space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/1fa46d61..a367725e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=30-31 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From gziemski at openjdk.org Wed Feb 12 19:36:59 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 12 Feb 2025 19:36:59 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v33] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: fix build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/a367725e..b652ee6e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=31-32 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From dlong at openjdk.org Wed Feb 12 20:27:12 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 12 Feb 2025 20:27:12 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Wed, 12 Feb 2025 06:56:40 GMT, Fei Yang wrote: > FYI: `test/hotspot/jtreg/compiler/jsr292/MHDeoptTest.java` and hs-tier1 test good on linux-riscv64 with fastdebug build. I good sanity check is to remove the fix in deoptimization.cpp and see if the new test triggers the new asserts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2654764709 From gziemski at openjdk.org Wed Feb 12 20:57:57 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 12 Feb 2025 20:57:57 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v34] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: - fix build - fix build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/b652ee6e..27575a11 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=32-33 Stats: 12 lines in 3 files changed: 2 ins; 1 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From kbarrett at openjdk.org Wed Feb 12 20:58:15 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 12 Feb 2025 20:58:15 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v6] In-Reply-To: <8gan8nfbwoDaaOqqnqpMwcG-XkvvVJFTPwKbTlY5VZ8=.26eeffbb-0702-44ef-bc3b-75abe58d476d@github.com> References: <8gan8nfbwoDaaOqqnqpMwcG-XkvvVJFTPwKbTlY5VZ8=.26eeffbb-0702-44ef-bc3b-75abe58d476d@github.com> Message-ID: On Mon, 10 Feb 2025 15:24:13 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - pass an empty set the method rather than a boolean > > better exception handling/message when encountering a binary file > - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot > - filter out `.zip` files > - Merge remote-tracking branch 'upstream/master' into NULL-Checking-in-hotspot > - trivial change, if .java files are filtered out then so should .class files > - revert to the original regex and remove the exclusion of os_windows.cpp > - update based on feedback > - Add a test to prevent NULL backsliding Changes requested by kbarrett (Reviewer). test/hotspot/jtreg/sources/TestNoNULL.java line 93: > 91: } > 92: > 93: private static void processFiles(Path directory, Set excludedFiles, Set excludeExtensions) throws IOException { `s/excludeExtensions/excludedExtensions/` here and elsewhere. ------------- PR Review: https://git.openjdk.org/jdk/pull/23466#pullrequestreview-2613178455 PR Review Comment: https://git.openjdk.org/jdk/pull/23466#discussion_r1953383462 From dlong at openjdk.org Wed Feb 12 21:01:15 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 12 Feb 2025 21:01:15 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash In-Reply-To: <00HHPN1Q9xrNf8Ps_9S7hOOHHmw2mNocFrQzqxzYhRA=.bb2f9c11-12c5-4efa-8314-4415e22e31f8@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> <00HHPN1Q9xrNf8Ps_9S7hOOHHmw2mNocFrQzqxzYhRA=.bb2f9c11-12c5-4efa-8314-4415e22e31f8@github.com> Message-ID: On Wed, 12 Feb 2025 12:44:18 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/ppc/abstractInterpreter_ppc.cpp line 136: >> >>> 134: // Test caller-aligned placement vs callee-aligned >>> 135: intptr_t* l2 = caller->sp() + method->max_locals() - 1 + (frame::java_abi_size / Interpreter::stackElementSize); >>> 136: assert(l2 >= locals_base, "bad placement"); >> >> The assertion at L136 fails on ppc64 (similar to what @offamitkumar reported for s390x). >> I don't understand the assertion because it is just a stricter version of the fist one. >> On ppc64 the sp of `caller` is aligned down because it needs to be 16 byte aligned. `locals_base` is only 8 byte aligned. But from what I saw the difference was larger then just one word. Maybe `caller` has got an c2i extension? I guess this would be problematic. >> On x86_64 `l2` depends on the last expression stack pointer not on the `caller`'s sp. If you try to translate this to ppc64 then you'll get the expression used to initialize `locals_base` at L128. >> I think you can remove the 2nd assertion. Even the first one looks redundant. >> Besides that I've tested `MHDeoptTest.java` successfully on ppc64. > >> Maybe `caller` has got an c2i extension? I guess this would be problematic. > > I meant i2c extension. The two asserts together are supposed to be an upper and lower bound. The first assert is a stricter version of the assert that was originally added by JDK-7090904. It looks like the 2nd assert should have been reversed, assuming l2 is correct. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1953390060 From dlong at openjdk.org Wed Feb 12 21:04:10 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 12 Feb 2025 21:04:10 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> <00HHPN1Q9xrNf8Ps_9S7hOOHHmw2mNocFrQzqxzYhRA=.bb2f9c11-12c5-4efa-8314-4415e22e31f8@github.com> Message-ID: <_qQKsbCLbRxjva6W92W8_k82ldOlqIkFnT2keBDKLlw=.320cd20b-e8b8-422c-86eb-6d2607870529@github.com> On Wed, 12 Feb 2025 20:58:34 GMT, Dean Long wrote: >>> Maybe `caller` has got an c2i extension? I guess this would be problematic. >> >> I meant i2c extension. > > The two asserts together are supposed to be an upper and lower bound. The first assert is a stricter version of the assert that was originally added by JDK-7090904. It looks like the 2nd assert should have been reversed, assuming l2 is correct. I was lazy about naming, so `l2` has a different meaning in the x64 asserts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1953393697 From dlong at openjdk.org Wed Feb 12 21:09:31 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 12 Feb 2025 21:09:31 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v2] In-Reply-To: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: > When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. > > In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. > > Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. Dean Long has updated the pull request incrementally with one additional commit since the last revision: fix bounds checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23557/files - new: https://git.openjdk.org/jdk/pull/23557/files/8734abd4..a7a0ed7a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23557&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23557&range=00-01 Stats: 6 lines in 4 files changed: 2 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23557.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23557/head:pull/23557 PR: https://git.openjdk.org/jdk/pull/23557 From dlong at openjdk.org Wed Feb 12 21:14:09 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 12 Feb 2025 21:14:09 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v2] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Wed, 12 Feb 2025 21:09:31 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > fix bounds checks I just pushed a fix for the s390 and ppc bounds check logic, but I'm still not sure if I am using the correct values for the end of the frame. The asserts should pass with the deoptimization.cpp fix. The 2nd assert should fail w/o the deoptimization.cpp fix when running the new test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2654853295 From kvn at openjdk.org Wed Feb 12 21:43:17 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 12 Feb 2025 21:43:17 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v7] In-Reply-To: References: Message-ID: <0BCMqqyxnY7s24ohUTW90CJmOpQ919wDtBJt92XesWE=.519ad411-777d-4f31-97bb-6bad78253b45@github.com> On Wed, 12 Feb 2025 15:47:04 GMT, Volodymyr Paprotski wrote: >> (Also see `8319429: Resetting MXCSR flags degrades ecore`) >> >> This PR fixes two issues: >> - the original issue is a crash caused by `__ warn` corrupting the stack on Windows only >> - This issue also uncovered that -Xcheck:jni test cases were getting 65k lines of warning on HelloWorld (on both Linux _and_ windows): >> >> OpenJDK 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall >> >> >> First, the crash. Caused when FXRSTOR is attempting to write reserved bits into MXCSR. If those bits happen to be set, crash. (Hence the crash isn't deterministic. But frequent enough if `__ warn` is used). It is caused by the binding not reserving stack space for register parameters () >> ![image](https://github.com/user-attachments/assets/4ad63908-088b-4e9d-9e7d-a3509bee046a) >> Prolog of the warn function then proceeds to store the for arg registers onto the stack, overriding the fxstore save area. (See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#calling-convention-defaults) >> >> Fix uses `frame::arg_reg_save_area_bytes` to bump the stack pointer. >> >> --- >> >> I also kept the fix to `verify_mxcsr` since without it, `-Xcheck:jni` is practically unusable when `-XX:+EnableX86ECoreOpts` are set (65k+ lines of warnings) > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/x86/macroAssembler_x86.cpp > > Co-authored-by: Julian Waters <32636402+TheShermanTanker at users.noreply.github.com> My tier1-4, stress, xcomp testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22673#pullrequestreview-2613271857 From vpaprotski at openjdk.org Wed Feb 12 21:46:15 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 12 Feb 2025 21:46:15 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v7] In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 15:47:04 GMT, Volodymyr Paprotski wrote: >> (Also see `8319429: Resetting MXCSR flags degrades ecore`) >> >> This PR fixes two issues: >> - the original issue is a crash caused by `__ warn` corrupting the stack on Windows only >> - This issue also uncovered that -Xcheck:jni test cases were getting 65k lines of warning on HelloWorld (on both Linux _and_ windows): >> >> OpenJDK 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall >> >> >> First, the crash. Caused when FXRSTOR is attempting to write reserved bits into MXCSR. If those bits happen to be set, crash. (Hence the crash isn't deterministic. But frequent enough if `__ warn` is used). It is caused by the binding not reserving stack space for register parameters () >> ![image](https://github.com/user-attachments/assets/4ad63908-088b-4e9d-9e7d-a3509bee046a) >> Prolog of the warn function then proceeds to store the for arg registers onto the stack, overriding the fxstore save area. (See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#calling-convention-defaults) >> >> Fix uses `frame::arg_reg_save_area_bytes` to bump the stack pointer. >> >> --- >> >> I also kept the fix to `verify_mxcsr` since without it, `-Xcheck:jni` is practically unusable when `-XX:+EnableX86ECoreOpts` are set (65k+ lines of warnings) > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/x86/macroAssembler_x86.cpp > > Co-authored-by: Julian Waters <32636402+TheShermanTanker at users.noreply.github.com> Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22673#issuecomment-2654908526 From duke at openjdk.org Wed Feb 12 22:22:19 2025 From: duke at openjdk.org (duke) Date: Wed, 12 Feb 2025 22:22:19 GMT Subject: RFR: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni [v7] In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 15:47:04 GMT, Volodymyr Paprotski wrote: >> (Also see `8319429: Resetting MXCSR flags degrades ecore`) >> >> This PR fixes two issues: >> - the original issue is a crash caused by `__ warn` corrupting the stack on Windows only >> - This issue also uncovered that -Xcheck:jni test cases were getting 65k lines of warning on HelloWorld (on both Linux _and_ windows): >> >> OpenJDK 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall >> >> >> First, the crash. Caused when FXRSTOR is attempting to write reserved bits into MXCSR. If those bits happen to be set, crash. (Hence the crash isn't deterministic. But frequent enough if `__ warn` is used). It is caused by the binding not reserving stack space for register parameters () >> ![image](https://github.com/user-attachments/assets/4ad63908-088b-4e9d-9e7d-a3509bee046a) >> Prolog of the warn function then proceeds to store the for arg registers onto the stack, overriding the fxstore save area. (See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#calling-convention-defaults) >> >> Fix uses `frame::arg_reg_save_area_bytes` to bump the stack pointer. >> >> --- >> >> I also kept the fix to `verify_mxcsr` since without it, `-Xcheck:jni` is practically unusable when `-XX:+EnableX86ECoreOpts` are set (65k+ lines of warnings) > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/x86/macroAssembler_x86.cpp > > Co-authored-by: Julian Waters <32636402+TheShermanTanker at users.noreply.github.com> @vpaprotsk Your change (at version cbd3812d18811e4b05e4d677cac1f8d4975a674a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22673#issuecomment-2654966743 From vpaprotski at openjdk.org Wed Feb 12 22:28:17 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 12 Feb 2025 22:28:17 GMT Subject: Integrated: 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 23:45:37 GMT, Volodymyr Paprotski wrote: > (Also see `8319429: Resetting MXCSR flags degrades ecore`) > > This PR fixes two issues: > - the original issue is a crash caused by `__ warn` corrupting the stack on Windows only > - This issue also uncovered that -Xcheck:jni test cases were getting 65k lines of warning on HelloWorld (on both Linux _and_ windows): > > OpenJDK 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall > > > First, the crash. Caused when FXRSTOR is attempting to write reserved bits into MXCSR. If those bits happen to be set, crash. (Hence the crash isn't deterministic. But frequent enough if `__ warn` is used). It is caused by the binding not reserving stack space for register parameters () > ![image](https://github.com/user-attachments/assets/4ad63908-088b-4e9d-9e7d-a3509bee046a) > Prolog of the warn function then proceeds to store the for arg registers onto the stack, overriding the fxstore save area. (See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#calling-convention-defaults) > > Fix uses `frame::arg_reg_save_area_bytes` to bump the stack pointer. > > --- > > I also kept the fix to `verify_mxcsr` since without it, `-Xcheck:jni` is practically unusable when `-XX:+EnableX86ECoreOpts` are set (65k+ lines of warnings) This pull request has now been integrated. Changeset: 55097dd4 Author: Volodymyr Paprotski URL: https://git.openjdk.org/jdk/commit/55097dd4cbb5d691c12cb0247d66dce593759d59 Stats: 127 lines in 9 files changed: 66 ins; 54 del; 7 mod 8344802: Crash in StubRoutines::verify_mxcsr with -XX:+EnableX86ECoreOpts and -Xcheck:jni Reviewed-by: jwaters, kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/22673 From dholmes at openjdk.org Thu Feb 13 00:20:16 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 13 Feb 2025 00:20:16 GMT Subject: RFR: 8349083: Factor out filename handling code from logging In-Reply-To: References: <82r1a8p4tVWdY0x7tkL1qCyFcMh2anmbZmWwfvPi1B4=.7e782b73-bcd3-4012-8fdb-f55bed69b2cb@github.com> Message-ID: On Wed, 12 Feb 2025 15:51:56 GMT, Zhengyu Gu wrote: > I am curious why re-stringifying the timestamp and host name are concerns? obviously, the code is not on any hot paths. For UL they are on the VM startup path. But in general I just don't like the wasted effort. > I am not trying to standardize timestamp in this PR, but a simple refactor so that we don't have wildcard parsing and replacing code all over the places. Understood, though I was hoping for some standardizing as having %t mean different things in different kinds of log files is not good, ------------- PR Comment: https://git.openjdk.org/jdk/pull/23410#issuecomment-2655137231 PR Comment: https://git.openjdk.org/jdk/pull/23410#issuecomment-2655139615 From jiangli at openjdk.org Thu Feb 13 00:55:19 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 13 Feb 2025 00:55:19 GMT Subject: RFR: 8349620: Add VMProps for static JDK In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 08:10:24 GMT, Alan Bateman wrote: > That's okay with me. I'm hoping Magnus will jump in when he gets a chance as he has experience with the "other" static build configurations. @magicus Any thoughts and input on this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2655185633 From kvn at openjdk.org Thu Feb 13 01:27:41 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 13 Feb 2025 01:27:41 GMT Subject: RFR: 8349753: Incorrect use of CodeBlob::is_buffer_blob() in few places Message-ID: `CodeBlob::is_buffer_blob()` method is incorrectly used in few places because BufferBlob is not "leaf" class. You need to add checks for its subclasses too. I also updated statistic output for CodeCache (`-XX:+PrintCodeCache -XX:+Verbose`) and corresponding test to reflect current state of code blobs. Tested tier1-4, stress, xcomp New output: Non-nmethod blobs: #67 runtime = 43K (hdr 4K 10%, loc 1K 3%, code 36K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) #0 upcall = 0K #1 uncommon trap = 0K (hdr 0K 13%, loc 0K 2%, code 0K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) #1 deoptimization = 2K (hdr 0K 3%, loc 0K 1%, code 2K 94%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) #1 exception = 0K (hdr 0K 30%, loc 0K 3%, code 0K 63%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) #3 safepoint = 4K (hdr 0K 4%, loc 0K 1%, code 4K 93%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) #639 adapter = 955K (hdr 44K 4%, loc 24K 2%, code 880K 92%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) #1 mh_adapter = 10K (hdr 0K 0%, loc 0K 0%, code 9K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) #1 vtable = 32K (hdr 0K 0%, loc 0K 0%, code 32K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) #12 buffer blob = 917K (hdr 0K 0%, loc 0K 0%, code 916K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) #0 other = 0K Output before: Non-nmethod blobs: #66 runtime = 42K (hdr 4K 10%, loc 1K 3%, code 36K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) #1 uncommon trap = 0K (hdr 0K 13%, loc 0K 2%, code 0K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) #1 deoptimization = 2K (hdr 0K 3%, loc 0K 1%, code 2K 94%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) #639 adapter = 955K (hdr 44K 4%, loc 24K 2%, code 880K 92%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) #12 buffer blob = 917K (hdr 0K 0%, loc 0K 0%, code 916K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) #6 other = 47K (hdr 0K 0%, loc 0K 0%, code 46K 98%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) ------------- Commit messages: - 8349753: Incorrect use of CodeBlob::is_buffer_blob() in few places Changes: https://git.openjdk.org/jdk/pull/23607/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23607&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349753 Stats: 47 lines in 4 files changed: 45 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23607.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23607/head:pull/23607 PR: https://git.openjdk.org/jdk/pull/23607 From dlong at openjdk.org Thu Feb 13 01:59:09 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 13 Feb 2025 01:59:09 GMT Subject: RFR: 8349753: Incorrect use of CodeBlob::is_buffer_blob() in few places In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 01:22:55 GMT, Vladimir Kozlov wrote: > `CodeBlob::is_buffer_blob()` method is incorrectly used in few places because BufferBlob is not "leaf" class. You need to add checks for its subclasses too. > > I also updated statistic output for CodeCache (`-XX:+PrintCodeCache -XX:+Verbose`) and corresponding test to reflect current state of code blobs. > > Tested tier1-4, stress, xcomp > > New output: > > Non-nmethod blobs: > #67 runtime = 43K (hdr 4K 10%, loc 1K 3%, code 36K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #0 upcall = 0K > #1 uncommon trap = 0K (hdr 0K 13%, loc 0K 2%, code 0K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 deoptimization = 2K (hdr 0K 3%, loc 0K 1%, code 2K 94%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 exception = 0K (hdr 0K 30%, loc 0K 3%, code 0K 63%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #3 safepoint = 4K (hdr 0K 4%, loc 0K 1%, code 4K 93%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #639 adapter = 955K (hdr 44K 4%, loc 24K 2%, code 880K 92%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 mh_adapter = 10K (hdr 0K 0%, loc 0K 0%, code 9K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 vtable = 32K (hdr 0K 0%, loc 0K 0%, code 32K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #12 buffer blob = 917K (hdr 0K 0%, loc 0K 0%, code 916K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #0 other = 0K > > > Output before: > > Non-nmethod blobs: > #66 runtime = 42K (hdr 4K 10%, loc 1K 3%, code 36K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 uncommon trap = 0K (hdr 0K 13%, loc 0K 2%, code 0K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 deoptimization = 2K (hdr 0K 3%, loc 0K 1%, code 2K 94%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #639 adapter = 955K (hdr 44K 4%, loc 24K 2%, code 880K 92%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #12 buffer blob = 917K (hdr 0K 0%, loc 0K 0%, code 916K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #6 other = 47K (hdr 0K 0%, loc 0K 0%, code 46K 98%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23607#pullrequestreview-2613632491 From kvn at openjdk.org Thu Feb 13 01:59:09 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 13 Feb 2025 01:59:09 GMT Subject: RFR: 8349753: Incorrect use of CodeBlob::is_buffer_blob() in few places In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 01:22:55 GMT, Vladimir Kozlov wrote: > `CodeBlob::is_buffer_blob()` method is incorrectly used in few places because BufferBlob is not "leaf" class. You need to add checks for its subclasses too. > > I also updated statistic output for CodeCache (`-XX:+PrintCodeCache -XX:+Verbose`) and corresponding test to reflect current state of code blobs. > > Tested tier1-4, stress, xcomp > > New output: > > Non-nmethod blobs: > #67 runtime = 43K (hdr 4K 10%, loc 1K 3%, code 36K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #0 upcall = 0K > #1 uncommon trap = 0K (hdr 0K 13%, loc 0K 2%, code 0K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 deoptimization = 2K (hdr 0K 3%, loc 0K 1%, code 2K 94%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 exception = 0K (hdr 0K 30%, loc 0K 3%, code 0K 63%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #3 safepoint = 4K (hdr 0K 4%, loc 0K 1%, code 4K 93%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #639 adapter = 955K (hdr 44K 4%, loc 24K 2%, code 880K 92%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 mh_adapter = 10K (hdr 0K 0%, loc 0K 0%, code 9K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 vtable = 32K (hdr 0K 0%, loc 0K 0%, code 32K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #12 buffer blob = 917K (hdr 0K 0%, loc 0K 0%, code 916K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #0 other = 0K > > > Output before: > > Non-nmethod blobs: > #66 runtime = 42K (hdr 4K 10%, loc 1K 3%, code 36K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 uncommon trap = 0K (hdr 0K 13%, loc 0K 2%, code 0K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 deoptimization = 2K (hdr 0K 3%, loc 0K 1%, code 2K 94%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #639 adapter = 955K (hdr 44K 4%, loc 24K 2%, code 880K 92%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #12 buffer blob = 917K (hdr 0K 0%, loc 0K 0%, code 916K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #6 other = 47K (hdr 0K 0%, loc 0K 0%, code 46K 98%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) Thank you, Dean ------------- PR Comment: https://git.openjdk.org/jdk/pull/23607#issuecomment-2655255875 From fyang at openjdk.org Thu Feb 13 03:14:10 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Feb 2025 03:14:10 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Wed, 12 Feb 2025 20:24:05 GMT, Dean Long wrote: > > FYI: `test/hotspot/jtreg/compiler/jsr292/MHDeoptTest.java` and hs-tier1 test good on linux-riscv64 with fastdebug build. > > I good sanity check is to remove the fix in deoptimization.cpp and see if the new test triggers the new asserts. Yeah! The new test triggers if I revert the fix. # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/ubuntu/jdk/src/hotspot/cpu/riscv/abstractInterpreter_riscv.cpp:145), pid=95195, tid=95217 # assert(locals >= interpreter_frame->sender_sp() + max_locals - 1) failed: bad placement # # JRE version: OpenJDK Runtime Environment (25.0) (fastdebug build 25-internal-adhoc.ubuntu.jdk) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 25-internal-adhoc.ubuntu.jdk, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-riscv64) # Problematic frame: # V [libjvm.so+0x2e1204] AbstractInterpreter::layout_activation(Method*, int, int, int, int, int, int, frame*, frame*, bool, bool)+0x3fa # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/ubuntu/jdk/build/linux-riscv64- server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_jsr292_MHDeoptTest_java/scratch/0/core.95195) # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # ------------- PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2655355838 From zgu at openjdk.org Thu Feb 13 03:21:11 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Thu, 13 Feb 2025 03:21:11 GMT Subject: RFR: 8349083: Factor out filename handling code from logging In-Reply-To: References: <82r1a8p4tVWdY0x7tkL1qCyFcMh2anmbZmWwfvPi1B4=.7e782b73-bcd3-4012-8fdb-f55bed69b2cb@github.com> Message-ID: On Thu, 13 Feb 2025 00:17:54 GMT, David Holmes wrote: > > I am not trying to standardize timestamp in this PR, but a simple refactor so that we don't have wildcard parsing and replacing code all over the places. > > Understood, though I was hoping for some standardizing as having %t mean different things in different kinds of log files is not good, I do see that using vm start time to standardizing `%t` probably is not preferred in other scenarios. For example, heap dump, I would prefer to use `current` time, so that I can align with gc log. Probably the same applies to JFR dump. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23410#issuecomment-2655364288 From asmehra at openjdk.org Thu Feb 13 04:10:11 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 13 Feb 2025 04:10:11 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks In-Reply-To: References: Message-ID: <8yqgZ4ffmEyui_CyUR9lS-2MV4ONfzSaA-pMz-VvDMA=.e92da2b8-cfc5-4ef9-ab33-9d14ca02a2f8@github.com> On Wed, 5 Feb 2025 22:32:58 GMT, Calvin Cheung wrote: > This changset refactors CDS class paths and module paths validation code into a new class `AOTCodeSource` and related class `AOTCodeSourceConfig`. Code has been moved from filemap.[c|h]pp, classLoader.[c|h]pp, and classLoaderExt.[c|h]pp to aotCodeSource.[c|h]pp. CDS dependencies have been removed from `classLoader.cpp`. More refactoring could be done, such as removing `classLoaderExt.cpp`, in a future RFE. > > Passed tiers 1 - 5 testing. src/hotspot/share/cds/aotCodeSource.cpp line 762: > 760: } > 761: > 762: if (is_boot_classpath && runtime_css.has_next() && (need_to_check_app_classpath() || num_module_paths() > 0)) { I am not sure I get what this block is for. Is it for the case where runtime boot cp has more entries than the dumptime boot cp, and it is checking if the extra entries really exist or they are just empty? If so, then `check_paths_existence` should only be checking the extra entries in the boot cp, not all of them. Can you please explain this and probably add a comment as well to describe what this block is for. src/hotspot/share/cds/aotCodeSource.cpp line 894: > 892: // matched exactly. > 893: bool AOTCodeSourceConfig::need_lcp_match(AllCodeSourceStreams& all_css) const { > 894: if (!need_lcp_match_helper(boot_start(), boot_end(), all_css.boot_cp()) || Can we reverse these conditions to make it easier to read? if (need_lcp_match_helper(boot_start(), boot_end(), all_css.boot_cp()) && need_lcp_match_helper(app_start(), app_end(), all_css.app_cp())) { return true; } else { return false; } src/hotspot/share/cds/aotCodeSource.cpp line 903: > 901: > 902: bool AOTCodeSourceConfig::need_lcp_match_helper(int start, int end, CodeSourceStream& css) const { > 903: if (app_end() == boot_start()) { I feel this block belongs to the caller `need_lcp_match`. src/hotspot/share/cds/aotCodeSource.hpp line 213: > 211: > 212: // Common accessors > 213: int boot_start() const { return 1; } Can we rename these methods to something like boot_start() -> boot_cp_start_index(). At the call site it makes it clear it is referring to the bootclasspath index, and not booting something :) src/hotspot/share/cds/aotCodeSource.hpp line 234: > 232: // Functions used only during dumptime > 233: static void dumptime_init(TRAPS); > 234: static size_t estimate_size_for_archive() { This method doesn't seem to be in use. Can this be removed? src/hotspot/share/cds/filemap.cpp line 318: > 316: if (header()->has_full_module_graph() && has_extra_module_paths) { > 317: CDSConfig::stop_using_optimized_module_handling(); > 318: log_info(cds)("optimized module handling: disabled because of extra module path(s) are specified"); typo: "disabled because ~of~ extra module path(s) are specified" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1953766439 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1953732835 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1953722869 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1953489002 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1953383815 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1953768345 From dholmes at openjdk.org Thu Feb 13 05:45:13 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 13 Feb 2025 05:45:13 GMT Subject: RFR: 8349083: Factor out filename handling code from logging In-Reply-To: References: Message-ID: On Sat, 1 Feb 2025 16:53:13 GMT, Zhengyu Gu wrote: > Factor out filename substitution code from unified logging, so that it can be used elsewhere: > > 1. Make filename substitution consistent. Support following substitutions cross JVM > ``` > %p -> pid > %t -> timestamp > %hn -> hostname > > > 2. Reduce redundant code The hope was we would have a range of time specifiers for different notion of timestamp - but given different uses of %t already exist that would be problematic. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23410#issuecomment-2655543708 From rehn at openjdk.org Thu Feb 13 06:39:18 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 13 Feb 2025 06:39:18 GMT Subject: RFR: 8349851: RISC-V: Call VM leaf can use movptr2 [v2] In-Reply-To: <9Fx0BjMvNHBiCuxeAph3FTkm0bCftDdgN4QvIImHPY0=.ff7f1e59-c251-4629-91c3-d26e19324bd6@github.com> References: <9Fx0BjMvNHBiCuxeAph3FTkm0bCftDdgN4QvIImHPY0=.ff7f1e59-c251-4629-91c3-d26e19324bd6@github.com> Message-ID: On Wed, 12 Feb 2025 06:44:50 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> There should be a small speed up to vm leafs. >> We can scratch t2 here as we just pushed it. (maybe I should have used t0) >> >> Passes t1 >> >> /Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > t0, remove ws Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23565#issuecomment-2655659962 From rehn at openjdk.org Thu Feb 13 06:39:20 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 13 Feb 2025 06:39:20 GMT Subject: Integrated: 8349851: RISC-V: Call VM leaf can use movptr2 In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 17:26:28 GMT, Robbin Ehn wrote: > Hi, please consider. > > There should be a small speed up to vm leafs. > We can scratch t2 here as we just pushed it. (maybe I should have used t0) > > Passes t1 > > /Robbin This pull request has now been integrated. Changeset: a637ccf2 Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/a637ccf2fead25ea6a06ad6bd65e92b8694ee11c Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8349851: RISC-V: Call VM leaf can use movptr2 Reviewed-by: fyang, mli ------------- PR: https://git.openjdk.org/jdk/pull/23565 From rrich at openjdk.org Thu Feb 13 06:52:10 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 13 Feb 2025 06:52:10 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v2] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Wed, 12 Feb 2025 21:11:36 GMT, Dean Long wrote: > I just pushed a fix for the s390 and ppc bounds check logic, but I'm still not sure if I am using the correct values for the end of the frame. Testing on ppc64 looks good so far. Will put the change through our CI testing. > The asserts should pass with the deoptimization.cpp fix. The 2nd assert should fail w/o the deoptimization.cpp fix when running the new test. The 2nd assert does not fail w/o the deoptimization.cpp fix. Might be due to alignement of caller->sp() in the interpreter. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2655682065 From shade at openjdk.org Thu Feb 13 08:32:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Feb 2025 08:32:16 GMT Subject: RFR: 8349753: Incorrect use of CodeBlob::is_buffer_blob() in few places In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 01:22:55 GMT, Vladimir Kozlov wrote: > `CodeBlob::is_buffer_blob()` method is incorrectly used in few places because BufferBlob is not "leaf" class. You need to add checks for its subclasses too. > > I also updated statistic output for CodeCache (`-XX:+PrintCodeCache -XX:+Verbose`) and corresponding test to reflect current state of code blobs. > > Tested tier1-4, stress, xcomp > > New output: > > Non-nmethod blobs: > #67 runtime = 43K (hdr 4K 10%, loc 1K 3%, code 36K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #0 upcall = 0K > #1 uncommon trap = 0K (hdr 0K 13%, loc 0K 2%, code 0K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 deoptimization = 2K (hdr 0K 3%, loc 0K 1%, code 2K 94%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 exception = 0K (hdr 0K 30%, loc 0K 3%, code 0K 63%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #3 safepoint = 4K (hdr 0K 4%, loc 0K 1%, code 4K 93%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #639 adapter = 955K (hdr 44K 4%, loc 24K 2%, code 880K 92%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 mh_adapter = 10K (hdr 0K 0%, loc 0K 0%, code 9K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 vtable = 32K (hdr 0K 0%, loc 0K 0%, code 32K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #12 buffer blob = 917K (hdr 0K 0%, loc 0K 0%, code 916K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #0 other = 0K > > > Output before: > > Non-nmethod blobs: > #66 runtime = 42K (hdr 4K 10%, loc 1K 3%, code 36K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 uncommon trap = 0K (hdr 0K 13%, loc 0K 2%, code 0K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 deoptimization = 2K (hdr 0K 3%, loc 0K 1%, code 2K 94%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #639 adapter = 955K (hdr 44K 4%, loc 24K 2%, code 880K 92%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #12 buffer blob = 917K (hdr 0K 0%, loc 0K 0%, code 916K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #6 other = 47K (hdr 0K 0%, loc 0K 0%, code 46K 98%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) Looks fine. I spot-checked other `CodeBlobKind`-s, and I don't think we are missing any other. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23607#pullrequestreview-2614215254 From nbenalla at openjdk.org Thu Feb 13 09:34:54 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Thu, 13 Feb 2025 09:34:54 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v7] In-Reply-To: References: Message-ID: > Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. > It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. > > Before adding line 86 and excluding `os_windows.cpp`, the test failed with: > > > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: > HMODULE hModule = NULL; > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: > GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); > java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. > at TestNoNULL.main(TestNoNULL.java:73) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > at java.base/java.lang.Thread.run(Thread.java:1447) Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: - rename excludeExtensions -> excludedExtensions - remove redundant import/throws ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23466/files - new: https://git.openjdk.org/jdk/pull/23466/files/2112c863..8c0d6267 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23466&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23466&range=05-06 Stats: 6 lines in 1 file changed: 0 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23466.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23466/head:pull/23466 PR: https://git.openjdk.org/jdk/pull/23466 From amitkumar at openjdk.org Thu Feb 13 11:16:23 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 13 Feb 2025 11:16:23 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v5] In-Reply-To: References: Message-ID: > s390x implementation for Class.isInstance intrinsic. > > Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. > > Benchmark results will be updated soon. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: wip: comments from Lutz ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23535/files - new: https://git.openjdk.org/jdk/pull/23535/files/33d5cbe3..fdd979a8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=03-04 Stats: 13 lines in 2 files changed: 2 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23535/head:pull/23535 PR: https://git.openjdk.org/jdk/pull/23535 From amitkumar at openjdk.org Thu Feb 13 11:22:11 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 13 Feb 2025 11:22:11 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v4] In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 17:51:29 GMT, Lutz Schmidt wrote: > In other words: is z13 the minimum H/W version? umm, no it's not. I will create a macroAssembler method to handle LOCHI. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1954315655 From epeter at openjdk.org Thu Feb 13 11:39:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Feb 2025 11:39:15 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> Message-ID: On Mon, 10 Feb 2025 09:26:32 GMT, Galder Zamarre?o wrote: >> @eastig is helping with the results on aarch64, so I will verify the numbers in same way done below for x86_64 once he provides me with the results. >> >> Here is a summary of the benchmarking results I'm seeing on x86_64 (I will push an update that just merges the latest master shortly). >> >> First I will go through the results of `MinMaxVector`. This benchmark computes throughput by default so the higher the number the better. >> >> # MinMaxVector AVX-512 >> >> Following are results with AVX-512 instructions: >> >> Benchmark (probability) (range) (seed) (size) Mode Cnt Baseline Patch Units >> MinMaxVector.longClippingRange N/A 90 0 1000 thrpt 4 834.127 3688.961 ops/ms >> MinMaxVector.longClippingRange N/A 100 0 1000 thrpt 4 1147.010 3687.721 ops/ms >> MinMaxVector.longLoopMax 50 N/A N/A 2048 thrpt 4 1126.718 1072.812 ops/ms >> MinMaxVector.longLoopMax 80 N/A N/A 2048 thrpt 4 1070.921 1070.538 ops/ms >> MinMaxVector.longLoopMax 100 N/A N/A 2048 thrpt 4 510.483 1073.081 ops/ms >> MinMaxVector.longLoopMin 50 N/A N/A 2048 thrpt 4 935.658 1016.910 ops/ms >> MinMaxVector.longLoopMin 80 N/A N/A 2048 thrpt 4 1007.410 933.774 ops/ms >> MinMaxVector.longLoopMin 100 N/A N/A 2048 thrpt 4 536.582 1017.337 ops/ms >> MinMaxVector.longReductionMax 50 N/A N/A 2048 thrpt 4 967.288 966.945 ops/ms >> MinMaxVector.longReductionMax 80 N/A N/A 2048 thrpt 4 967.327 967.382 ops/ms >> MinMaxVector.longReductionMax 100 N/A N/A 2048 thrpt 4 849.689 967.327 ops/ms >> MinMaxVector.longReductionMin 50 N/A N/A 2048 thrpt 4 966.323 967.275 ops/ms >> MinMaxVector.longReductionMin 80 N/A N/A 2048 thrpt 4 967.340 967.228 ops/ms >> MinMaxVector.longReductionMin 100 N/A N/A 2048 thrpt 4 880.921 967.233 ops/ms >> >> >> ### `longReduction[Min|Max]` performance improves slightly when probability is 100 >> >> Without the patch the code uses compare instructions: >> >> >> 7.83% ???? ???? ? 0x00007f4f700fb305: imulq $0xb, 0x20(%r14, %r8, 8), %rdi >> ???? ???... > >> At 100% probability baseline fails to vectorize because it observes a control flow. This control flow is not the one you see in min/max implementations, but this is one added by HotSpot as a result of the JIT profiling. It observes that one branch is always taken so it optimizes for that, and adds a branch for the uncommon case where the branch is not taken. > > I've dug further into this to try to understand how the baseline hotspot code works, and the explanation above is not entirely correct. Let's look at the IR differences between say 100% vs 80% branch situations. > > At branch 80% you see: > > 1115 CountedLoop === 1115 598 463 [[ 1101 1115 1116 1118 451 594 ]] inner stride: 2 main of N1115 strip mined !orig=[599],[590],[307] !jvms: MinMaxVector::longLoopMax @ bci:10 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) > > 692 LoadL === 1083 1101 393 [[ 747 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; #long (does not depend only on test, unknown control) !orig=[395] !jvms: MinMaxVector::longLoopMax @ bci:26 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) > 651 LoadL === 1095 1101 355 [[ 747 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; #long (does not depend only on test, unknown control) !orig=[357] !jvms: MinMaxVector::longLoopMax @ bci:20 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) > 747 MaxL === _ 651 692 [[ 451 ]] !orig=[608],[416] !jvms: Math::max @ bci:11 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) > > 451 StoreL === 1115 1101 449 747 [[ 1116 454 911 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=9; !orig=1124 !jvms: MinMaxVector::longLoopMax @ bci:30 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) > > 594 CountedLoopEnd === 1115 593 [[ 1123 463 ]] [lt] P=0.999731, C=780799.000000 !orig=[462] !jvms: MinMaxVector::longLoopMax @ bci:7 (line 235) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) > > > You see the counted loop with the LoadL for array loads and MaxL consuming those. The StoreL is for array assignment (I think). > > At branch 100% you see: > > > ... @galderz Thanks for all the explanations, that's really helpful ? **Discussion** - AVX512: only imprivements. - Expecially with probability 100, where before we used the bytecode, which would then create an `unstable_if` with uncommon trap. That meant we could not re-discover the CMove / Max later in the IR. Now that we never inline the bytecode, and just intrinsify directly, we can use `vpmax` and that is faster. - Ah, maybe that was all incorrect, though it sounded reasonable. You seem to suggest that we actually did use to inline both branches, but that the issue was that `PhaseIdealLoop::conditional_move` does not like extreme probabilities, and so it did not convert 100% cases to CMove, and so it did not use to vectorize. Right. Getting the probability cutoff just right it a little tricky there, and the precise number can seem strange. But that's a discussion for another day. - The reduction case is only improved slightly... at least. Maybe we can further improve the throughput with [this](https://bugs.openjdk.org/browse/JDK-8345245) later on. - AVX2: mixed results - `longReductionMax/Min`: vector max / min is not implemented. We should investigate why. - It seems like the `MaxVL` and `MinVL` (e.g. `vpmaxsq`) instructions are only implemented directly for AVX512, see [this](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#ig_expand=4669,2611&text=max_epi64). - As you suggested @galderz we could consider implementing it via `cmove` in the backend for `AVX2` and maybe lower. Maybe we can talk with @jatin-bhateja about this. That would probably already be worth it on its own, in a separate RFE. Because I would suspect it could give speedup in the non 100% cases as well. Maybe this would even have to be an RFE that makes it in first, so we don't have regressions here? - But even still: just intfinsifying should not get us a regression, because there will always be cases where the auto-vectorizer fails, and so the scalar code should not be slower with your patch than on master, right? So we need to investigate this scalar issue as well. - VectorReduction2.WithSuperword on AVX-512 - `long[Min|Max]Simple performance drops considerably`. Yes, this case is not yet supposed to vectorize, I'm working on that - it is the issue with "simple" reductions, i.e. those that do no work other than reduce. Our current reduction heuristic thinks these are not profitable to vectorize - but that is wrong in almost all cases. You even filed an issue for that a while back ;) see https://bugs.openjdk.org/browse/JDK-8345044 and related issues. We could bite the bullet on this, knowing that I'm working on it and it will probably fix that issue, or we just wait a little here. Let's discuss. - VectorReduction2.NoSuperword on AVX-512 machine - Hmm, ok. So we seem to realize that the scalar case is slower with your patch in some cases, because now we have a `cmove` on the critical path, and previously we could just predict the branches, which was faster. Interesting that the number of other instructions has an effect here as well, you seem to see a speedup with the "big" benchmarks, but the "small" and "dot" benchmarks are slower. This is surprising. It would be great if we understood why it behaves this way. **Summary** Wow, things are more complicated than I would have thought, I hope you are not too discouraged ? We seem to have these issues, maybe there are more: - AVX2 does not have long-vector-min/max implemented. That can be done in a separate RFE. - Simple reductions do not vectorize, known issue see https://bugs.openjdk.org/browse/JDK-8345044, I'm working on that. - Scalar reductions are slower with your patch for extreme probabilities. Before, they were done with branches, and branch prediction was fast. Now with cmove or max instructions, the critical path is longer, and that makes things slow. Maybe this could be alleviated by reordering / reassociating the reduction path, see [JDK-8345245](https://bugs.openjdk.org/browse/JDK-8345245). Alternatively, we could convert the `cmove` back to a branch, but for that we would probably need to know the branching probability, which we now do not have any more, right? Tricky. This seems the real issue we need to address and discuss. @galderz What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2656328729 From epeter at openjdk.org Thu Feb 13 11:49:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Feb 2025 11:49:18 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> Message-ID: On Mon, 10 Feb 2025 09:26:32 GMT, Galder Zamarre?o wrote: >> @eastig is helping with the results on aarch64, so I will verify the numbers in same way done below for x86_64 once he provides me with the results. >> >> Here is a summary of the benchmarking results I'm seeing on x86_64 (I will push an update that just merges the latest master shortly). >> >> First I will go through the results of `MinMaxVector`. This benchmark computes throughput by default so the higher the number the better. >> >> # MinMaxVector AVX-512 >> >> Following are results with AVX-512 instructions: >> >> Benchmark (probability) (range) (seed) (size) Mode Cnt Baseline Patch Units >> MinMaxVector.longClippingRange N/A 90 0 1000 thrpt 4 834.127 3688.961 ops/ms >> MinMaxVector.longClippingRange N/A 100 0 1000 thrpt 4 1147.010 3687.721 ops/ms >> MinMaxVector.longLoopMax 50 N/A N/A 2048 thrpt 4 1126.718 1072.812 ops/ms >> MinMaxVector.longLoopMax 80 N/A N/A 2048 thrpt 4 1070.921 1070.538 ops/ms >> MinMaxVector.longLoopMax 100 N/A N/A 2048 thrpt 4 510.483 1073.081 ops/ms >> MinMaxVector.longLoopMin 50 N/A N/A 2048 thrpt 4 935.658 1016.910 ops/ms >> MinMaxVector.longLoopMin 80 N/A N/A 2048 thrpt 4 1007.410 933.774 ops/ms >> MinMaxVector.longLoopMin 100 N/A N/A 2048 thrpt 4 536.582 1017.337 ops/ms >> MinMaxVector.longReductionMax 50 N/A N/A 2048 thrpt 4 967.288 966.945 ops/ms >> MinMaxVector.longReductionMax 80 N/A N/A 2048 thrpt 4 967.327 967.382 ops/ms >> MinMaxVector.longReductionMax 100 N/A N/A 2048 thrpt 4 849.689 967.327 ops/ms >> MinMaxVector.longReductionMin 50 N/A N/A 2048 thrpt 4 966.323 967.275 ops/ms >> MinMaxVector.longReductionMin 80 N/A N/A 2048 thrpt 4 967.340 967.228 ops/ms >> MinMaxVector.longReductionMin 100 N/A N/A 2048 thrpt 4 880.921 967.233 ops/ms >> >> >> ### `longReduction[Min|Max]` performance improves slightly when probability is 100 >> >> Without the patch the code uses compare instructions: >> >> >> 7.83% ???? ???? ? 0x00007f4f700fb305: imulq $0xb, 0x20(%r14, %r8, 8), %rdi >> ???? ???... > >> At 100% probability baseline fails to vectorize because it observes a control flow. This control flow is not the one you see in min/max implementations, but this is one added by HotSpot as a result of the JIT profiling. It observes that one branch is always taken so it optimizes for that, and adds a branch for the uncommon case where the branch is not taken. > > I've dug further into this to try to understand how the baseline hotspot code works, and the explanation above is not entirely correct. Let's look at the IR differences between say 100% vs 80% branch situations. > > At branch 80% you see: > > 1115 CountedLoop === 1115 598 463 [[ 1101 1115 1116 1118 451 594 ]] inner stride: 2 main of N1115 strip mined !orig=[599],[590],[307] !jvms: MinMaxVector::longLoopMax @ bci:10 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) > > 692 LoadL === 1083 1101 393 [[ 747 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; #long (does not depend only on test, unknown control) !orig=[395] !jvms: MinMaxVector::longLoopMax @ bci:26 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) > 651 LoadL === 1095 1101 355 [[ 747 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; #long (does not depend only on test, unknown control) !orig=[357] !jvms: MinMaxVector::longLoopMax @ bci:20 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) > 747 MaxL === _ 651 692 [[ 451 ]] !orig=[608],[416] !jvms: Math::max @ bci:11 (line 2037) MinMaxVector::longLoopMax @ bci:27 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) > > 451 StoreL === 1115 1101 449 747 [[ 1116 454 911 ]] @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; Memory: @long[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=9; !orig=1124 !jvms: MinMaxVector::longLoopMax @ bci:30 (line 236) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) > > 594 CountedLoopEnd === 1115 593 [[ 1123 463 ]] [lt] P=0.999731, C=780799.000000 !orig=[462] !jvms: MinMaxVector::longLoopMax @ bci:7 (line 235) MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub @ bci:19 (line 124) > > > You see the counted loop with the LoadL for array loads and MaxL consuming those. The StoreL is for array assignment (I think). > > At branch 100% you see: > > > ... @galderz How sure are that intrinsifying directly is really the right approach? Maybe the approach via `PhaseIdealLoop::conditional_move` where we know the branching probability is a better one. Though of course knowing the branching probability is no perfect heuristic for how good branch prediction is going to be, but it is at least something. So I'm wondering if there could be a different approach that sees all the wins you get here, without any of the regressions? If we are just interested in better vectorization: the current issue is that the auto-vectorizer cannot handle CFG, i.e. we do not yet do if-conversion. But if we had if-conversion, then the inlined CFG of min/max would just be converted to vector CMove (or vector min/max where available) at that point. We can take the branching probabilities into account, just like `PhaseIdealLoop::conditional_move` does - if that is necessary. Of course if-conversion is far away, and we will encounter a lot of issues with branch prediction etc, so I'm scared we might never get there - but I want to try ;) Do we see any other wins with your patch, that are not due to vectorization, but just scalar code? @galderz Maybe we can discuss this offline at some point as well :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2656350896 PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2656351785 From amitkumar at openjdk.org Thu Feb 13 12:50:34 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 13 Feb 2025 12:50:34 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v6] In-Reply-To: References: Message-ID: <5uj6o_GcYuXJFQ_LRFg_51y-n495B-epGbE2f9rAVL4=.b80da86c-76aa-4582-833f-286a6a99cf4f@github.com> > s390x implementation for Class.isInstance intrinsic. > > Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. > > Benchmark results will be updated soon. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: fixes older hardware issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23535/files - new: https://git.openjdk.org/jdk/pull/23535/files/fdd979a8..8bf5b92a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=04-05 Stats: 31 lines in 3 files changed: 28 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23535/head:pull/23535 PR: https://git.openjdk.org/jdk/pull/23535 From amitkumar at openjdk.org Thu Feb 13 12:54:12 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 13 Feb 2025 12:54:12 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v4] In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 11:19:04 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3674: >> >>> 3672: Register r_temp2, >>> 3673: Register r_temp3) { >>> 3674: assert_different_registers(r_sub_klass, r_super_klass, r_result, r_temp1, r_temp2, r_temp3, Z_R0_scratch); >> >> Better use `LOCGHI` further down and avoid use of `Z_R0_scratch`. >> You are using `LOCHI` in `Runtime1::generate_code_for()` anyway which implies you are sure the load/store-on-condition facility 2 is installed. In other words: is z13 the minimum H/W version? >> >> Even more simplification: there is no need to set `r_linear_result` conditionally. You set it to 1 and branch to failure if array length is zero. For all other cases, repne_scan() does the right thing. > >> In other words: is z13 the minimum H/W version? > > umm, no it's not. I will create a macroAssembler method to handle LOCHI. I have pushed new code and : 1. did argument shuffling with frame resizing. 2. made sure that things will not break for hardware older than z13. Can you have another look :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1954440468 From amitkumar at openjdk.org Thu Feb 13 12:59:50 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 13 Feb 2025 12:59:50 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v7] In-Reply-To: References: Message-ID: > s390x implementation for Class.isInstance intrinsic. > > Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. > > Benchmark results will be updated soon. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: space for 3 registers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23535/files - new: https://git.openjdk.org/jdk/pull/23535/files/8bf5b92a..db593594 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23535/head:pull/23535 PR: https://git.openjdk.org/jdk/pull/23535 From zgu at openjdk.org Thu Feb 13 14:22:09 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Thu, 13 Feb 2025 14:22:09 GMT Subject: RFR: 8349083: Factor out filename handling code from logging In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 05:42:33 GMT, David Holmes wrote: > The hope was we would have a range of time specifiers for different notion of timestamp - but given different uses of %t already exist that would be problematic. I hear you! I certainly would like to see the consistency and would like to vote for using `current` time for `%t`, because it really simplifies the file retention policy. I think this refactoring can provide a center place to enforce whatever the standard we will agree on :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23410#issuecomment-2656741405 From azafari at openjdk.org Thu Feb 13 15:26:10 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 13 Feb 2025 15:26:10 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v23] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. > - Adding a runtime flag for selecting the old or new version can be added later. > - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: flag/type -> tag chages are removed. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20425/files - new: https://git.openjdk.org/jdk/pull/20425/files/35c11b96..611a2d4f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=21-22 Stats: 53 lines in 13 files changed: 0 ins; 0 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From kvn at openjdk.org Thu Feb 13 16:01:11 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 13 Feb 2025 16:01:11 GMT Subject: RFR: 8349753: Incorrect use of CodeBlob::is_buffer_blob() in few places In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 01:22:55 GMT, Vladimir Kozlov wrote: > `CodeBlob::is_buffer_blob()` method is incorrectly used in few places because BufferBlob is not "leaf" class. You need to add checks for its subclasses too. > > I also updated statistic output for CodeCache (`-XX:+PrintCodeCache -XX:+Verbose`) and corresponding test to reflect current state of code blobs. > > Tested tier1-4, stress, xcomp > > New output: > > Non-nmethod blobs: > #67 runtime = 43K (hdr 4K 10%, loc 1K 3%, code 36K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #0 upcall = 0K > #1 uncommon trap = 0K (hdr 0K 13%, loc 0K 2%, code 0K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 deoptimization = 2K (hdr 0K 3%, loc 0K 1%, code 2K 94%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 exception = 0K (hdr 0K 30%, loc 0K 3%, code 0K 63%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #3 safepoint = 4K (hdr 0K 4%, loc 0K 1%, code 4K 93%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #639 adapter = 955K (hdr 44K 4%, loc 24K 2%, code 880K 92%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 mh_adapter = 10K (hdr 0K 0%, loc 0K 0%, code 9K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 vtable = 32K (hdr 0K 0%, loc 0K 0%, code 32K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #12 buffer blob = 917K (hdr 0K 0%, loc 0K 0%, code 916K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #0 other = 0K > > > Output before: > > Non-nmethod blobs: > #66 runtime = 42K (hdr 4K 10%, loc 1K 3%, code 36K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 uncommon trap = 0K (hdr 0K 13%, loc 0K 2%, code 0K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 deoptimization = 2K (hdr 0K 3%, loc 0K 1%, code 2K 94%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #639 adapter = 955K (hdr 44K 4%, loc 24K 2%, code 880K 92%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #12 buffer blob = 917K (hdr 0K 0%, loc 0K 0%, code 916K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #6 other = 47K (hdr 0K 0%, loc 0K 0%, code 46K 98%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) Thank you, Aleksey, for review. I will wait until [PR for moving Relocation from CodeCache]( https://github.com/openjdk/jdk/pull/21276) is integrated to not mess up @bulasevich work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23607#issuecomment-2657049881 From stuefe at openjdk.org Thu Feb 13 16:10:18 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 13 Feb 2025 16:10:18 GMT Subject: RFR: 8330174: Protection zone for easier detection of accidental zero-nKlass use [v5] In-Reply-To: References: Message-ID: <_x8wDAuB9JowWvn7B23sfD3k92Y3H_HT5N-65zkrIKk=.15eb9425-2660-42f7-83fb-2db2bca0e737@github.com> On Thu, 23 Jan 2025 06:59:09 GMT, Ioi Lam wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> fix whitespace error > > Changes requested by iklam (Reviewer). @iklam This is surprisingly complex. So, let's say I use _mapping_offset as suggested. With that solution, we would set the archive requested base to 0x800000000, the first region mapped at e.g., 0x801000000 (assuming a 16MB protzone size for a moment), and so on. The problems I ran into were caused by all the file global data for which offsets were precalculated in the header. These offsets refer to the start of the mapped archive base, and there are implicit assumptions that the mapped archive base equals the start of the dump time staging buffer. However, since the staging buffer should not contain the protzone (we don't want to save that), this assumption is incorrect. The offsets calculated at dump time must, at runtime, refer to the start of the mapping, which will include the protection zone, while at dump time, they refer to the start of the staging buffer, which does not include the protection zone. Whichever way one tries to solve this, one gets domino effects. If I try to fix `ArchiveBuilder::any_to_addr` to make the offsets match up at runtime, they won't match at dump time to the correct position in the staging area and I get dump time asserts. If I let the staging buffer include the protection zone at dump time but then just don't write that part (so start writing after the end of the protection zone in the staging buffer), the file offsets at runtime won't match. And so on, tried a couple of different ways and ultimately just gave up. Therefore, I settled for an "easier" less invasive way, which is to precede the mapped archive space with the protection zone. In this proposal, the requested base equals the the archive base as before, so the point where the first region gets mapped and to which the global offsets refer too. However, that mapping is preceded by the protection zone, and we must make now sure to initialize narrow klass decoding with the start of the protection zone as the encoding base, not the start of the mapping archive. We also must ensure that the encoding base address is still optimized for narrow Klass encoding (aligned correctly etc). And we must precalculate at dump time the narrow Klass IDs with the future protection zone address, not the future mapping start, as the encoding base. Still, this was easier, since outside of precalculating narrow Klass IDs, nothing in CDS relies really on the encoding base. The annoying part now is that "SharedBaseAddress" changes its meaning. Before, it meant "start of mapping aka encoding base". Now it means "start of mapping" but encoding base is SharedBaseAddress - protzone size. So, not sure. You can find the current "SharedBaseAddress points to after the protection zone" approach (a) here: https://github.com/tstuefe/jdk/tree/nklass-protzone-take3 In case you are interested, you can find the first "_mapping_offset" approach, aborted (b), here: https://github.com/tstuefe/jdk/tree/nklass-protzone-take2 So, compared to my original patch, is (a) the "right way"? Not sure. Yes, we don't pay 16KB of additional memory for the jsa file. However, one could argue that a protzone filled with zeros would just compress to nothing anyway, so I am not sure how much we really gain. In any case, I am unsure which course to follow here. I would hate it, however, if we just abandoned this work, since I have been bitten by nKlass=0 dereferences one too many times in the past. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23190#issuecomment-2657076306 From kvn at openjdk.org Thu Feb 13 16:12:19 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 13 Feb 2025 16:12:19 GMT Subject: Integrated: 8349753: Incorrect use of CodeBlob::is_buffer_blob() in few places In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 01:22:55 GMT, Vladimir Kozlov wrote: > `CodeBlob::is_buffer_blob()` method is incorrectly used in few places because BufferBlob is not "leaf" class. You need to add checks for its subclasses too. > > I also updated statistic output for CodeCache (`-XX:+PrintCodeCache -XX:+Verbose`) and corresponding test to reflect current state of code blobs. > > Tested tier1-4, stress, xcomp > > New output: > > Non-nmethod blobs: > #67 runtime = 43K (hdr 4K 10%, loc 1K 3%, code 36K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #0 upcall = 0K > #1 uncommon trap = 0K (hdr 0K 13%, loc 0K 2%, code 0K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 deoptimization = 2K (hdr 0K 3%, loc 0K 1%, code 2K 94%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 exception = 0K (hdr 0K 30%, loc 0K 3%, code 0K 63%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #3 safepoint = 4K (hdr 0K 4%, loc 0K 1%, code 4K 93%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #639 adapter = 955K (hdr 44K 4%, loc 24K 2%, code 880K 92%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 mh_adapter = 10K (hdr 0K 0%, loc 0K 0%, code 9K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 vtable = 32K (hdr 0K 0%, loc 0K 0%, code 32K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #12 buffer blob = 917K (hdr 0K 0%, loc 0K 0%, code 916K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #0 other = 0K > > > Output before: > > Non-nmethod blobs: > #66 runtime = 42K (hdr 4K 10%, loc 1K 3%, code 36K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 uncommon trap = 0K (hdr 0K 13%, loc 0K 2%, code 0K 84%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #1 deoptimization = 2K (hdr 0K 3%, loc 0K 1%, code 2K 94%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #639 adapter = 955K (hdr 44K 4%, loc 24K 2%, code 880K 92%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #12 buffer blob = 917K (hdr 0K 0%, loc 0K 0%, code 916K 99%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) > #6 other = 47K (hdr 0K 0%, loc 0K 0%, code 46K 98%, stub 0K 0%, [oops 0K 0%, metadata 0K 0%, data 0K 0%, pcs 0K 0%]) This pull request has now been integrated. Changeset: 0b50e479 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/0b50e479a060cf745a3e858d535516444fe80fd8 Stats: 47 lines in 4 files changed: 45 ins; 0 del; 2 mod 8349753: Incorrect use of CodeBlob::is_buffer_blob() in few places Reviewed-by: dlong, shade ------------- PR: https://git.openjdk.org/jdk/pull/23607 From stuefe at openjdk.org Thu Feb 13 16:14:02 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 13 Feb 2025 16:14:02 GMT Subject: RFR: 8344009: Improve compiler memory statistics Message-ID: Greetings, This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. I wanted to track that information correctly and display it clearly in a way that is easy to understand. The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. The statistic gives us two new forms of output: 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: Phase Total ra node comp type index reglive regsplit cienv other none 1205512 155104 982984 33712 0 0 0 0 0 33712 parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 optimizer 916584 0 556416 0 0 0 0 0 0 360168 escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 macroEliminate 196448 0 196448 0 0 0 0 0 0 0 iterGVN 327440 0 196368 131072 0 0 0 0 0 0 incrementalInline 3992816 0 3043704 621832 0 0 0 0 261824 65456 incrementalInline_igvn 458512 0 163640 294872 0 0 0 0 0 0 incrementalInline_inline 32728 0 0 32728 0 0 0 0 0 0 337880 0 108504 229376 0 0 0 0 0 0 idealLoop 2499568 0 566696 1932872 0 0 0 0 0 0 idealLoopVerify 327600 0 0 327600 0 0 0 0 0 0 ccp 65456 0 32728 0 0 0 0 0 0 32728 macroExpand 1898544 0 1570944 327600 0 0 0 0 0 0 graphReshape 347920 0 347920 0 0 0 0 0 0 0 matcher 9480400 1817928 6417448 1245024 0 0 0 0 0 0 postselect_cleanup 163800 163800 0 0 0 0 0 0 0 0 scheduler 458192 32728 425464 0 0 0 0 0 0 0 regalloc 178072 178072 0 0 0 0 0 0 0 0 ctorChaitin 39432 39432 0 0 0 0 0 0 0 0 regAllocSplit 1865496 32728 1832768 0 0 0 0 0 0 0 chaitinCoalesce1 1277112 196608 1080504 0 0 0 0 0 0 0 peephole 32728 0 32728 0 0 0 0 0 0 0 output 17868312 17868312 0 0 0 0 0 0 0 0 shorten branches 458472 65456 32728 360288 0 0 0 0 0 0 This is pretty self-explanatory. In this example, when the compilation hit its peak of 58MB, it shows how much (column `Total`) we have allocated on behalf of each separate C2 phase that finished. Note that "none" means allocations happening outside of a TracePhase scope. The columns following `Total` show the breakup, in this particular phase, into individual arena types (resourcearea, node arena etc). 2) We also get a detailed timeline of phase execution and the gradual memory buildup. This also shows large spikes of memory that were confined inside the phase, and that we never got to see before: Allocation timelime by phase: Phase seq. number Bytes Nodes >0 (outside) 102120 (+102120) 3 (+3) >1 parse 11787496 (+11685376) 7151 (+7148) <0 (cont.) (outside) 11787496 (+0) 7151 (+0) >2 optimizer 11787496 (+0) 7151 (+0) >3 iterGVN 12180392 (+392896) 6313 (-838) <2 (cont.) optimizer 12180392 (+0) 6313 (+0) >4 incrementalInline 12180392 (+0) 6313 (+0) >5 incrementalInline_inline 12213120 (+32728) 6330 (+17) <4 (cont.) incrementalInline 12213120 (+0) 6330 (+0) >6 incrementalInline_pru 12213120 (+0) 6287 (-43) <4 (cont.) incrementalInline 12213120 (+0) 6287 (+0) >7 incrementalInline_igvn 12213120 (+0) 6286 (-1) <4 (cont.) incrementalInline 16631400 (+4418280) 17200 (+10914) >8 incrementalInline_pru 16631400 (+0) 10185 (-7015) <4 (cont.) incrementalInline 16631400 (+0) 10185 (+0) >9 incrementalInline_igvn 17122640 (+491240) 9374 (-811) <4 (cont.) incrementalInline 17122640 (+0) 9374 (+0) <2 (cont.) optimizer 17122640 (+0) 9396 (+22) >10 incrementalInline_pru 17122640 (+0) 9360 (-36) <2 (cont.) optimizer 17122640 (+0) 9360 (+0) >11 incrementalInline_igvn 17122640 (+0) 9353 (-7) <2 (cont.) optimizer 17745072 (+622432) 9318 (-35) >12 18082952 (+337880) 9317 (-1) <2 (cont.) optimizer 18082952 (+0) 9317 (+0) >13 idealLoop 18420712 (+337760) 9247 (-70) significant temporary peak: 19762880 (+1679928) < SNIP SNIP > <41 (cont.) regalloc 52599072 (+0) 23604 (+0) >71 chaitinSelect 52599072 (+0) 23604 (+0) <41 (cont.) regalloc 52599072 (+0) 23604 (+0) >72 postAllocCopyRemoval 52599072 (+0) 19285 (-4319) significant temporary peak: 58359200 (+5760128) <41 (cont.) regalloc 52599072 (+0) 19285 (+0) >73 mergeMultidefs 52599072 (+0) 19285 (+0) <41 (cont.) regalloc 52599072 (+0) 19285 (+0) >74 fixupSpills 52599072 (+0) 19256 (-29) <41 (cont.) regalloc 40458304 (-12140768) 19256 (+0) <0 (cont.) (outside) 40458304 (+0) 19256 (+0) >75 blockOrdering 40458304 (+0) 19268 (+12) <0 (cont.) (outside) 40458304 (+0) 19268 (+0) >76 peephole 40491032 (+32728) 19332 (+64) <0 (cont.) (outside) 40491032 (+0) 19332 (+0) >77 output 40491032 (+0) 19336 (+4) >78 shorten branches 40949504 (+458472) 19239 (-97) <77 (cont.) output 40949504 (+0) 19240 (+1) significant temporary peak: 58817816 (+17868312) >79 bldOopMaps 48294864 (+7345360) 19240 (+0) <77 (cont.) output 48294864 (+0) 19240 (+0) >80 fill buffer 50635256 (+2340392) 19574 (+334) <77 (cont.) output 50635256 (+0) 19574 (+0) >81 install_code 50635256 (+0) 19574 (+0) <77 (cont.) output 50634272 (-984) 19574 (+0) <0 (cont.) (outside) 2225624 (-48408648) 0 (-19574) The timeline shows the individual phases in the order in which they were executed. The left side is the tree of possibly nested TracePhase scopes. Every phase execution (since a phase can run multiple times) now has a unique phase sequence number. The tree shows those phase sequence numbers, and if a child phase ends, you can see that we are back in the outer phase again ("cont."). Behind that, we see the phase name, allocations made during that phase (Bytes), and the number of nodes allocated. If the phase caused a large temporary spike, it shows up as a "significant temporary peak". Here you can find a full printout of this information on a run of springboot petclinic with `-XX:CompileCommand=memstat,*.*` via `jcmd spring Compiler.memory verbose`: https://gist.github.com/tstuefe/9d00e7129d4cdf5d1dbc294a80cbe4ed ---- We also now print more detailed information if the JVM runs against a MemLimit and crashes (option `-XX:CompileCommand=memlimit`). Note that we run with a memlimit of 1GB by default in debug JVMs, and we already found a couple of footprint issues with this setting. See an example of a generated hs-err file here: https://gist.github.com/tstuefe/777ecd68b313097a8e9020ac9fea239a ---- Performance costs: Not really much to tell here; we only do a tiny bit more work now per arena chunk allocation and after the compilation ended, but nothing crazy. Nevertheless, I tested before and after this patch with SpecJBB, nothing rose above background noise. Footprint costs: The new information causes the per-compilation data to be a lot more bulky. That is why I reduced the information stored long-term: we now only store the 32 most expensive compilations. To see the cost of every compilation in its gory details, use the `-XX:CompileCommand=memstat,*.*,print` option. Of course, you can always reduce the scope of your tracking, e.g. limit it to a certain class only. --- Tests: - I tested manually on x64 Linux. GHAs ran (including the new tests). - Tests at SAP are green ------------- Commit messages: - more code grooming and comments - grooming - wip - Improved compiler memory statistics Changes: https://git.openjdk.org/jdk/pull/23530/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344009 Stats: 1753 lines in 28 files changed: 1187 ins; 237 del; 329 mod Patch: https://git.openjdk.org/jdk/pull/23530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23530/head:pull/23530 PR: https://git.openjdk.org/jdk/pull/23530 From roland at openjdk.org Thu Feb 13 16:46:16 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 13 Feb 2025 16:46:16 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> Message-ID: On Thu, 13 Feb 2025 11:46:35 GMT, Emanuel Peter wrote: > Do we see any other wins with your patch, that are not due to vectorization, but just scalar code? I think there are some. The current transformation from the parsed version of min/max to a conditional move to a `Max`/`Min` node depends on the conditional move transformation which has its own set of heuristics and while it happens on simple test cases, that's not necessarily the case on all code shapes. I don't think we want to trust it too much. With the intrinsic, the type of the min or max can be narrowed down in a way it can't be whether the code includes control flow or a conditional move. That in turn, once types have propagated, could cause some constant to appear and could be a significant win. The `Min`/`Max` nodes are floating nodes. They can hoist out of loop and common reliably in ways that are not guaranteed otherwise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2657176312 From gziemski at openjdk.org Thu Feb 13 17:16:11 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 13 Feb 2025 17:16:11 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v35] In-Reply-To: References: Message-ID: <7clnaUWaJmNG1crINBV4-ox-PpBelIXn7ln3zkblIEk=.a4dc0d6f-3e1c-487d-b663-7fecb2519093@github.com> > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > We create 2 static instances: one NMT_MemoryLogRecorder the other NMT_VirtualMemoryLogRecorder. > > VM interacts with these through these APIs: > > ``` > NMT_LogRecorder::initialize(NMTRecordMemoryAllocations, NMTRecordVirtualMemoryAllocations); > NMT_LogRecorder::replay(NMTBenchmarkRecordedDir, NMTBenchmarkRecordedPID); > NMT_LogRecorder::logThreadName(name); > NMT_LogRecorder::finish(); > > > For controlling their liveness and through their "log" APIs for the actual logging. > > For memory logger those are: > > > NMT_MemoryLogRecorder::log_malloc(mem_tag, outer_size, outer_ptr, &stack); > NMT_MemoryLogRecorder::log_realloc(mem_tag, new_outer_size, new_outer_ptr, header, &stack); > NMT_MemoryLogRecorder::log_free(old_outer_ptr); > > > and for virtual memory logger, those are: > > > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_release((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_uncommit((address)addr, size); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_reserve_and_commit((address)addr, size, stack, mem_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_commit((address)addr, size, stack); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_split_reserved((address)addr, size, split, mem_tag, split_tag); > NMT_VirtualMemoryLogRecorder::log_virtual_memory_tag((address)addr, mem_tag); > > > That's the entirety of the surface area of the new code. > > The actual implementation extends one existing VM API: > > `bool Arguments::copy_expand_pid(const char* src, size_t srclen, char* buf, size_t buflen, int pid) > ` > > and adds a few APIs to permit_forbidden_function.hpp: > > > inline char *strtok(char *str, const char *sep) { return ::strtok(str, sep); } > inline long strtol(const char *str, char **endptr, int base) { return ::strtol(str, endptr, base); } > > #if defined(LINUX) > inline size_t malloc_usable_size(void *_Nullable ptr) { return ::malloc_usable_size(ptr); } > #elif defined(WINDOWS) > inline size_t _msize(void *memblock) { return ::_msize(memblock); } > #elif defined(__APPLE__) > inline size_t malloc_size(const void *ptr) { return ::malloc_size(ptr); } > #endif > > > Those are need if we want to calculate the memory overhead > > To use, you first need to record the pattern of operations, ex: > > `./build/macosx-aarch64-server-release/xcode/build/jdk/bin/... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: cleanup fprintf types ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/27575a11..f27eeb16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=33-34 Stats: 27 lines in 1 file changed: 0 ins; 0 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From epeter at openjdk.org Thu Feb 13 17:16:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 13 Feb 2025 17:16:22 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> Message-ID: On Thu, 13 Feb 2025 16:43:22 GMT, Roland Westrelin wrote: > The current transformation from the parsed version of min/max to a conditional move to a Max/Min node depends on the conditional move transformation which has its own set of heuristics and while it happens on simple test cases, that's not necessarily the case on all code shapes. I don't think we want to trust it too much. Well, actually people have tried to improve the conditonal move transformation, and it is really really difficult. It's hard not to get regressions. I'm wondering how much easier it is for min / max. Maybe we have similar limitations, especially with predicting how well branch prediction performs. You are probably right about type propagation and `Min / Max` being floating nodes. @rwestrel What do you think about the regressions in the scalar cases of this patch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2657253439 From gziemski at openjdk.org Thu Feb 13 17:27:04 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 13 Feb 2025 17:27:04 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v36] In-Reply-To: References: Message-ID: <3eiJPyzCqT_gp5Rsgrm5Daaqk6PcwduHWg3y1Bd13Rc=.5ad29530-2fd5-4160-b496-3db6582ff62d@github.com> > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: cleanup fprintf types ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/f27eeb16..86173fe1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=35 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=34-35 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From gziemski at openjdk.org Thu Feb 13 17:27:05 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 13 Feb 2025 17:27:05 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v35] In-Reply-To: <7clnaUWaJmNG1crINBV4-ox-PpBelIXn7ln3zkblIEk=.a4dc0d6f-3e1c-487d-b663-7fecb2519093@github.com> References: <7clnaUWaJmNG1crINBV4-ox-PpBelIXn7ln3zkblIEk=.a4dc0d6f-3e1c-487d-b663-7fecb2519093@github.com> Message-ID: On Thu, 13 Feb 2025 17:16:11 GMT, Gerard Ziemski wrote: >> Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. >> >> Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` >> >> Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): >> >> >> malloc summary: >> >> time:8,951,473[ns] [samples:117,717] >> memory requested:28,474,918 bytes, allocated:29,904,416 bytes, >> malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] >> >> NMT type: objects: bytes: time: count%: bytes%: time%: overhead: >> ------------------------------------------------------------------------------------------------------------------------- >> Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? >> Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? >> Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? >> Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? >> Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? >> GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? >> GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? >> Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? >> JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? >> Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? >> Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? >> Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? >> Native Memory Tracking: 367 30,736 17... > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > cleanup fprintf types I have written the design document, which so many requested - please see the original issue: [https://bugs.openjdk.org/browse/JDK-8317453](https://bugs.openjdk.org/browse/JDK-8317453) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23115#issuecomment-2657272836 From gziemski at openjdk.org Thu Feb 13 17:40:33 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 13 Feb 2025 17:40:33 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v37] In-Reply-To: References: Message-ID: <3_ZXHs6-0K7PoprEvf41PNTf9pg5K9OPcxIZipqlEAI=.78fc3ba0-efab-4988-953e-ca04630ed641@github.com> > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: fix strncpy warning? ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/86173fe1..f5e491ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=36 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=35-36 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From fmatte at openjdk.org Thu Feb 13 18:08:44 2025 From: fmatte at openjdk.org (Fairoz Matte) Date: Thu, 13 Feb 2025 18:08:44 GMT Subject: RFR: 8347833: CrashOnOutOfMemory should stop GC threads before HeapDumpOnOutOfMemoryError In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 18:07:32 GMT, Fairoz Matte wrote: > When CrashOnOutOfMemory and HeapDumpOnOutOfMemoryError invoked together, we should make sure, it is performed in a single safepoint, this will avoid allowing other threads to run and throw OOM errors after the initial one is already under error logging. I am still working on fixing the test case. Will get back on this once it is done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23519#issuecomment-2657368070 From fmatte at openjdk.org Thu Feb 13 18:08:44 2025 From: fmatte at openjdk.org (Fairoz Matte) Date: Thu, 13 Feb 2025 18:08:44 GMT Subject: RFR: 8347833: CrashOnOutOfMemory should stop GC threads before HeapDumpOnOutOfMemoryError [v2] In-Reply-To: References: Message-ID: > When CrashOnOutOfMemory and HeapDumpOnOutOfMemoryError invoked together, we should make sure, it is performed in a single safepoint, this will avoid allowing other threads to run and throw OOM errors after the initial one is already under error logging. Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: Aditional work on review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23519/files - new: https://git.openjdk.org/jdk/pull/23519/files/ce22439f..30e95aaa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23519&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23519&range=00-01 Stats: 26 lines in 4 files changed: 14 ins; 3 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23519/head:pull/23519 PR: https://git.openjdk.org/jdk/pull/23519 From fmatte at openjdk.org Thu Feb 13 18:08:45 2025 From: fmatte at openjdk.org (Fairoz Matte) Date: Thu, 13 Feb 2025 18:08:45 GMT Subject: RFR: 8347833: CrashOnOutOfMemory should stop GC threads before HeapDumpOnOutOfMemoryError [v2] In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 03:51:12 GMT, David Holmes wrote: >> Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: >> >> Aditional work on review comments > > src/hotspot/share/utilities/vmError.cpp line 1952: > >> 1950: if(dumpHeap) { >> 1951: HeapDumper::dump_heap_from_oome(); >> 1952: } > > To be done at the same safepoint this code needs to be in `VM_ReportJavaOutOfMemory::doit()` - which is why the `dumpHeap` was to be passed to the `VM_ReportJavaOutOfMemory` constructor and stored in a field for `doit`. I have modified this with my understanding, I need to fix the testcase to make sure, the operation happens in single safepoint > test/hotspot/jtreg/runtime/ErrorHandling/TestHeapDumpOnOutOfMemoryAndCrashOnOutOfMemory.java line 43: > >> 41: try { >> 42: Object[] oa = new Object[Integer.MAX_VALUE]; >> 43: for(int i = 0; i < oa.length; i++) { > > Suggestion: > > for (int i = 0; i < oa.length; i++) { done > test/hotspot/jtreg/runtime/ErrorHandling/TestHeapDumpOnOutOfMemoryAndCrashOnOutOfMemory.java line 44: > >> 42: Object[] oa = new Object[Integer.MAX_VALUE]; >> 43: for(int i = 0; i < oa.length; i++) { >> 44: oa[i] = new Object[Integer.MAX_VALUE]; > > This will throw the "VM limit reached" OOME - does that trigger the heapdump etc processing? I have copied this code from TestHeapDumpOnOutOfMemoryError.java, and I have restricted heap to -Xmx128M to force OOM in heap. > test/hotspot/jtreg/runtime/ErrorHandling/TestHeapDumpOnOutOfMemoryAndCrashOnOutOfMemory.java line 57: > >> 55: OutputAnalyzer output = new OutputAnalyzer(pb.start()); >> 56: int exitValue = output.getExitValue(); >> 57: if(0 != exitValue) { > > Suggestion: > > if (exitValue != 0) { done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23519#discussion_r1954989600 PR Review Comment: https://git.openjdk.org/jdk/pull/23519#discussion_r1954990360 PR Review Comment: https://git.openjdk.org/jdk/pull/23519#discussion_r1954992402 PR Review Comment: https://git.openjdk.org/jdk/pull/23519#discussion_r1954990027 From iklam at openjdk.org Thu Feb 13 18:30:13 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 13 Feb 2025 18:30:13 GMT Subject: RFR: 8330174: Protection zone for easier detection of accidental zero-nKlass use [v5] In-Reply-To: References: Message-ID: On Thu, 23 Jan 2025 06:15:24 GMT, Thomas Stuefe wrote: >> If we wrongly decode an nKlass of `0`, and the nKlass encoding base is not NULL (typical for most cases that run with CDS enabled), the resulting pointer points to the start of the Klass encoding range. That area is readable. If CDS is enabled, it will be at the start of the CDS metadata archive. If CDS is off, it is at the start of the class space. >> >> Now, both CDS and class space allocate a safety buffer at the start to prevent Klass structures from being located there. However, that memory is still readable, so we can read garbage data from that area. In the case of CDS, that area is just 16 bytes, after that come real data. Since Klass is large, most accesses will read beyond the 16-byte zone. >> >> We should protect the first page in the narrow Klass encoding range to make analysis of errors like this easier. Especially in release builds where decode_not_null does not assert. We already use a similar technique in the heap, and most OSes protect the zero page for the same reason. >> >> This patch does that. Now, decoding an `0` nKlass and then using the result `Klass` - calling virtual functions or accessing members - crashes right away. >> >> Additionally, the patch provides a helpful output in the register/stack section, e.g: >> >> >> RDI=0x0000000800000000 points into nKlass protection zone >> >> >> >> Testing: >> - GHAs. >> - I tested the patch manually on x64 Linux for both CDS on, CDS off and zero-based encoding, CDS off and non-zero-based encoding. >> - I tested manually on Windows x64 >> - I also prepared an automatic gtest, but that needs some preparatory work on the gtest suite first to work (see https://bugs.openjdk.org/browse/JDK-8348029) >> >> -- Update 2024-01-22 -- >> I added a jtreg test that is more thorough than a gtest (also scans the produced hs-err file) > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > fix whitespace error It might be easier if we introduce a new "core" region called "protection" that's 16MB in size, and allocate that before the rw region in the output buffer. We never map this region so it doesn't need to be stored in the archive file. Let me try this out and see if it works. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23190#issuecomment-2657414745 From iklam at openjdk.org Thu Feb 13 18:31:42 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 13 Feb 2025 18:31:42 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v4] In-Reply-To: References: Message-ID: > Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. > > With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. > > To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): > >> the format of the configuration and cache files is not specified and is subject to change without notice. > > **Notes for reviewers:** > > - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. > - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. > - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. > - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. > - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. > > **Misc Note** > - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccba6dcce4a3) will be integrated separ... Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself - Improve error message when AOTMode=create has an incompatible classpath ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23484/files - new: https://git.openjdk.org/jdk/pull/23484/files/74f5e29d..9f78bb90 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23484&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23484&range=02-03 Stats: 112 lines in 5 files changed: 94 ins; 15 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23484/head:pull/23484 PR: https://git.openjdk.org/jdk/pull/23484 From vlivanov at openjdk.org Thu Feb 13 19:01:21 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 13 Feb 2025 19:01:21 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v2] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Wed, 12 Feb 2025 21:09:31 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > fix bounds checks src/hotspot/share/runtime/deoptimization.cpp line 645: > 643: methodHandle method(current, deopt_sender.interpreter_frame_method()); > 644: Bytecode_invoke cur = Bytecode_invoke_check(method, deopt_sender.interpreter_frame_bci()); > 645: if (cur.is_invokedynamic() || cur.is_invokehandle()) { Can you elaborate, please, why invokedynamic case is not needed anymore? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1955062378 From gziemski at openjdk.org Thu Feb 13 20:11:10 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 13 Feb 2025 20:11:10 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v38] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: exit as soon as finished replay to avoid issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/f5e491ef..8d749f09 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=37 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=36-37 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From kvn at openjdk.org Thu Feb 13 20:22:20 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 13 Feb 2025 20:22:20 GMT Subject: RFR: 8349652: Rewire nmethod oop load barriers In-Reply-To: References: Message-ID: <3OkEV1h99JOwwTGAjzYwcklT1p-t8hAu_j_SBwFdPmU=.357dbc24-61ce-4c27-8578-f3dc3740ef6a@github.com> On Fri, 7 Feb 2025 09:57:15 GMT, Stefan Karlsson wrote: > When loading oops from nmethods we current use the Access API to inject load barriers for the GCs that requires them. As part of the ZGC load barrier we need access to the nmethod to properly perform the load barrier. The current implementation of the Access API doesn't support passing down the nmethod through all its layers of code so ZGC asks the code cache what nmethod the various oops belongs to. There's currently an open PR for JDK-8343789 (#21276), which moves the oops out of the code cache, so the current way ZGC implementation will not work after that has been integrated. > > The proposal is to figure out a way to explicitly pass down the nmethod to the load barriers. > > We could extend the Access API to pass down the nmethod through all its various layers. The drawback of that is that it adds a lot of boiler plate code and requires new over loads and/or names. Given that this isn't performance critical code I propose that we take the much simpler route and call straight to the BarrierSetNMethod class. > > Given that MMethodAccess and IN_NMETHOD were only introduced to support nmethod oop loads for ZGC and are note used anymore I've also removed them from the code. > > Tested with reproducer for the ZGC issue in JDK-8343789, tier1-7 Linux with ZGC tasks, currently running tier1-3. Looks reasonable to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23512#pullrequestreview-2616131709 From dlong at openjdk.org Thu Feb 13 21:23:15 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 13 Feb 2025 21:23:15 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v2] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: <7xJgm0ScXMp4iRaH7Sf5QfsrTv2jOV4078kPqn3aoCs=.63303086-b4bd-47c5-9bd5-e69e28f75f4c@github.com> On Thu, 13 Feb 2025 18:57:24 GMT, Vladimir Ivanov wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> fix bounds checks > > src/hotspot/share/runtime/deoptimization.cpp line 645: > >> 643: methodHandle method(current, deopt_sender.interpreter_frame_method()); >> 644: Bytecode_invoke cur = Bytecode_invoke_check(method, deopt_sender.interpreter_frame_bci()); >> 645: if (cur.is_invokedynamic() || cur.is_invokehandle()) { > > Can you elaborate, please, why invokedynamic case is not needed anymore? As far as I can tell, it was never needed. If an invokedynamic or invokehandle adds an appendix, then it will show up in the callee, and will be reflected in the caller args size, so there is no mismatch. As far as the JVM is concerned, an invokedynamic/invokehandle looks like a call to a JVM-generated adapter. The only way for invokedynamic/invokehandle to cause an argument mismatch is if the JVM resolved the call-site to an adapter that was actually a MethodHandle linker. That is the exception I describe in the comment below. If we ever allowed the JVM to do that, then several other checks would also need to be fixed. For the record, this code used to call cur.is_method_handle_invoke(), which was also wrong, but at least it had a name closer to what we would want. Ideally, something like is_method_handle_linker_invoke() that checks for linkToVirtual, linkToStatic, linkToSpecial, and linkToInterface would have been better. The old comment about "arbitrary chains of calls" seems to be left over from an early JSR292 feature known as Ricochet Frames. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1955228726 From dlong at openjdk.org Thu Feb 13 21:26:12 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 13 Feb 2025 21:26:12 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v2] In-Reply-To: <7xJgm0ScXMp4iRaH7Sf5QfsrTv2jOV4078kPqn3aoCs=.63303086-b4bd-47c5-9bd5-e69e28f75f4c@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> <7xJgm0ScXMp4iRaH7Sf5QfsrTv2jOV4078kPqn3aoCs=.63303086-b4bd-47c5-9bd5-e69e28f75f4c@github.com> Message-ID: On Thu, 13 Feb 2025 21:20:33 GMT, Dean Long wrote: >> src/hotspot/share/runtime/deoptimization.cpp line 645: >> >>> 643: methodHandle method(current, deopt_sender.interpreter_frame_method()); >>> 644: Bytecode_invoke cur = Bytecode_invoke_check(method, deopt_sender.interpreter_frame_bci()); >>> 645: if (cur.is_invokedynamic() || cur.is_invokehandle()) { >> >> Can you elaborate, please, why invokedynamic case is not needed anymore? > > As far as I can tell, it was never needed. If an invokedynamic or invokehandle adds an appendix, then it will show up in the callee, and will be reflected in the caller args size, so there is no mismatch. As far as the JVM is concerned, an invokedynamic/invokehandle looks like a call to a JVM-generated adapter. The only way for invokedynamic/invokehandle to cause an argument mismatch is if the JVM resolved the call-site to an adapter that was actually a MethodHandle linker. That is the exception I describe in the comment below. If we ever allowed the JVM to do that, then several other checks would also need to be fixed. > For the record, this code used to call cur.is_method_handle_invoke(), which was also wrong, but at least it had a name closer to what we would want. Ideally, something like is_method_handle_linker_invoke() that checks for linkToVirtual, linkToStatic, linkToSpecial, and linkToInterface would have been better. > The old comment about "arbitrary chains of calls" seems to be left over from an early JSR292 feature known as Ricochet Frames. For the curious, it is still possible create an arbitrarily long chain of linkTo calls, but only trusted code would be able to do that, so I'm not addressing this issue in this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1955232327 From dlong at openjdk.org Thu Feb 13 21:35:12 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 13 Feb 2025 21:35:12 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v2] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: <_jQq0sGaZijMT6Cr3rUdrQLlvVWNuJ8uILHg1qkCxoM=.271ce257-9e72-4d61-87fa-588ce4dbe107@github.com> On Thu, 13 Feb 2025 06:49:50 GMT, Richard Reingruber wrote: > The 2nd assert does not fail w/o the deoptimization.cpp fix. Might be due to alignement of caller->sp() in the interpreter. Aarch64 also does alignment, and that's why the test uses two different methods, one with an extra local, to hopefully handle both cases of even/odd 2-word (16 byte) alignment. But ppc might be different enough that this isn't enough to trigger the bug. Or maybe the end of frame bound is slightly off? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2657755434 From gziemski at openjdk.org Thu Feb 13 21:39:35 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 13 Feb 2025 21:39:35 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v39] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: add missing heder, show NMT level ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/8d749f09..39886985 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=38 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=37-38 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From dholmes at openjdk.org Fri Feb 14 05:37:11 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 14 Feb 2025 05:37:11 GMT Subject: RFR: 8347833: CrashOnOutOfMemory should stop GC threads before HeapDumpOnOutOfMemoryError [v2] In-Reply-To: References: Message-ID: <2pMqSLSLN3ZvhqOFwqVQUhscx61ZN-Xm1xc0fnDjWZk=.823b5a53-80d6-4b6e-8ca1-3fb5a20dced7@github.com> On Thu, 13 Feb 2025 18:08:44 GMT, Fairoz Matte wrote: >> When CrashOnOutOfMemory and HeapDumpOnOutOfMemoryError invoked together, we should make sure, it is performed in a single safepoint, this will avoid allowing other threads to run and throw OOM errors after the initial one is already under error logging. > > Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: > > Aditional work on review comments src/hotspot/share/utilities/debug.cpp line 272: > 270: VMError::report_java_out_of_memory(message, HeapDumpOnOutOfMemoryError, CrashOnOutOfMemoryError); > 271: > 272: if (CrashOnOutOfMemoryError) { The `if (CrashOnOutOfMemoryError)` is unreachable code as `report_java_out_of_memory` already aborted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23519#discussion_r1955582527 From dholmes at openjdk.org Fri Feb 14 05:40:10 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 14 Feb 2025 05:40:10 GMT Subject: RFR: 8347833: CrashOnOutOfMemory should stop GC threads before HeapDumpOnOutOfMemoryError [v2] In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 18:05:21 GMT, Fairoz Matte wrote: >> test/hotspot/jtreg/runtime/ErrorHandling/TestHeapDumpOnOutOfMemoryAndCrashOnOutOfMemory.java line 44: >> >>> 42: Object[] oa = new Object[Integer.MAX_VALUE]; >>> 43: for(int i = 0; i < oa.length; i++) { >>> 44: oa[i] = new Object[Integer.MAX_VALUE]; >> >> This will throw the "VM limit reached" OOME - does that trigger the heapdump etc processing? > > I have copied this code from TestHeapDumpOnOutOfMemoryError.java, and I have restricted heap to -Xmx128M to force OOM in heap. The heap size will make no difference as you are trying to create an array that is larger than the VM allows. Line 42 will throw OOME and the for loop is never reached. | Welcome to JShell -- Version 23 | For an introduction type: /help intro jshell> Object[] oa = new Object[Integer.MAX_VALUE]; | Exception java.lang.OutOfMemoryError: Requested array size exceeds VM limit | at (#1:1) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23519#discussion_r1955584612 From kbarrett at openjdk.org Fri Feb 14 06:03:23 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 14 Feb 2025 06:03:23 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v7] In-Reply-To: References: Message-ID: <4LdnqhqXJbYYYpvT4aRsrIBN7IW174ZAXAnfcQ6IqEg=.92f84123-08d5-4065-9e4d-51dc64da93c8@github.com> On Thu, 13 Feb 2025 09:34:54 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: > > - rename excludeExtensions -> excludedExtensions > - remove redundant import/throws Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23466#pullrequestreview-2616833857 From stuefe at openjdk.org Fri Feb 14 06:42:09 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 14 Feb 2025 06:42:09 GMT Subject: RFR: 8344009: Improve compiler memory statistics In-Reply-To: References: Message-ID: <6IATWzJgb5zFVmIcXhH3XFoVOeU1RxinjTPIvhm4vL0=.f2d9e94d-e0f9-40ad-b843-25defa3c3b09@github.com> On Sat, 8 Feb 2025 06:56:40 GMT, Thomas Stuefe wrote: > Greetings, > > This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. > > Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. > > I wanted to track that information correctly and display it clearly in a way that is easy to understand. > > The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). > > The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. > > The statistic gives us two new forms of output: > > 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: > > > Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: > Phase Total ra node comp type index reglive regsplit cienv other > none 1205512 155104 982984 33712 0 0 0 0 0 33712 > parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 > optimizer 916584 0 556416 0 0 0 0 0 0 360168 > escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 > connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 > macroEliminate 196448 0 196448 0 0 0 0 0 0 0 > iterGVN 327440 0 196368 131072 0 0 0 0 0 0 > incrementalInline 3992816 0 3043704 621832 0 0 0 0 261824... Some additional technical information about how this statistic works: The JVM informs the statistics about the following events: A) When a compilation starts B) When a compilation ends. C) When a new compilation phase starts. That can happen in nested form. D) When a compilation phase ends. E) Whenever an arena grows a new chunk (regardless of whether this was a cached chunk from the chunk pool or a newly allocated chunk). F) When an arena sheds chunks - either by rolling back to a previous ResourceMark or because the arena itself gets deleted. During compilation (between (A) and (B)), we keep the statistic state for this compilation in an `ArenaStatCounter` object that is attached to the current compiler thread. When a new compilation phase starts (C), we push the phase info onto a `PhaseInfoStack`. When a phase ends, we pop that information. When we are informed of a new chunk allocation (E), we: - Set a stamp in the chunk header to mark it as being owned by this phase and this arena type - In the `ArenaStatCounter` object, we adjust global counters and counters in a two-dimensional table (`ArenaCounterTable`) that keeps counters per arena tag and compilation phase. - If total memory consumption for this compilation reaches a new peak, we take a snapshot of all counters as peak state. - We also handle `MemLimit` violations here: if `-XX:CompileCommand=memlimit...` was enabled, and the total footprint of the compilation surpasses that limit, we either end the JVM with a fatal error or we bail on the compilation. That depends on the sub-option given to the command. When informed of a chunk deletion (F), we: - extract the stamp from the chunk header to know what phase/arena type this deallocation accounts to - we then adjust the counters for that phase/arena type in the `ArenaCounterTable` When a compilation phase ends (D), we adjust the "footprint timeline". The footprint timeline - `FootprintTimeline` - is a one-dimensional buffer of (phase info, counter) tupels. It represents the "flattened out" form of the phase invocation tree: an invocation of a child phase nested in a parent phase "interrupts" the parent phase, and when the child phase ends, the parent phase is "restarted" as a new entry in the timeline. For example, let's say we execute phase "optimizer", and inside that, call the phase "iterGVN" and then "incrementalInline". Between these two phases, we allocate from resource area. The invocation tree looks like this: > optimizer 1024 KB > iterGVN 1032 KB < optimizer (cont.) 1032 KB + 1MB resource arena > incrementalInline 1032 KB + 1MB resource arena < optimizer (cont.) 1032 KB + 1MB resource arena The flattened-out footprint timeline will look somewhat like this: Phase Sequence Number | Phase Name | Footprint 5 optimizer 1024 KB 6 iterGVN 1032 KB 5 optimizer 1032 KB + 1MB 7 incrementalInline 1032 KB + 1MB 5 optimizer 1032 KB + 1MB Finally, when the compilation ends, we print out the statistic for it (if the suboption `print` was given with `-XX:CompileCommand=memstat`). We also save a copy of the counters to a global table that contains the N most expensive compilations. That table will be printed when one uses `jcmd Compiler.memory`. We also print it into the hs-err file. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2658400920 From rcastanedalo at openjdk.org Fri Feb 14 08:58:09 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 14 Feb 2025 08:58:09 GMT Subject: RFR: 8344009: Improve compiler memory statistics In-Reply-To: References: Message-ID: On Sat, 8 Feb 2025 06:56:40 GMT, Thomas Stuefe wrote: > Greetings, > > This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. > > Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. > > I wanted to track that information correctly and display it clearly in a way that is easy to understand. > > The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). > > The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. > > The statistic gives us two new forms of output: > > 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: > > > Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: > Phase Total ra node comp type index reglive regsplit cienv other > none 1205512 155104 982984 33712 0 0 0 0 0 33712 > parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 > optimizer 916584 0 556416 0 0 0 0 0 0 360168 > escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 > connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 > macroEliminate 196448 0 196448 0 0 0 0 0 0 0 > iterGVN 327440 0 196368 131072 0 0 0 0 0 0 > incrementalInline 3992816 0 3043704 621832 0 0 0 0 261824... Hi Thomas, this looks very useful, thanks! I will run some Oracle-internal functional and performance testing and come back with the results next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2658651169 From rcastanedalo at openjdk.org Fri Feb 14 09:13:16 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 14 Feb 2025 09:13:16 GMT Subject: RFR: 8344009: Improve compiler memory statistics In-Reply-To: References: Message-ID: On Sat, 8 Feb 2025 06:56:40 GMT, Thomas Stuefe wrote: > Greetings, > > This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. > > Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. > > I wanted to track that information correctly and display it clearly in a way that is easy to understand. > > The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). > > The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. > > The statistic gives us two new forms of output: > > 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: > > > Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: > Phase Total ra node comp type index reglive regsplit cienv other > none 1205512 155104 982984 33712 0 0 0 0 0 33712 > parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 > optimizer 916584 0 556416 0 0 0 0 0 0 360168 > escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 > connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 > macroEliminate 196448 0 196448 0 0 0 0 0 0 0 > iterGVN 327440 0 196368 131072 0 0 0 0 0 0 > incrementalInline 3992816 0 3043704 621832 0 0 0 0 261824... src/hotspot/share/runtime/handles.hpp line 2: > 1: /* > 2: * Copyright (c) 1997, 2024, Oracle and/or its affiliates. All rights reserved. Nit: unnecessary copyright header change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1955794509 From stuefe at openjdk.org Fri Feb 14 09:34:18 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 14 Feb 2025 09:34:18 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v2] In-Reply-To: References: Message-ID: > Greetings, > > This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. > > Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. > > I wanted to track that information correctly and display it clearly in a way that is easy to understand. > > The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). > > The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. > > The statistic gives us two new forms of output: > > 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: > > > Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: > Phase Total ra node comp type index reglive regsplit cienv other > none 1205512 155104 982984 33712 0 0 0 0 0 33712 > parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 > optimizer 916584 0 556416 0 0 0 0 0 0 360168 > escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 > connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 > macroEliminate 196448 0 196448 0 0 0 0 0 0 0 > iterGVN 327440 0 196368 131072 0 0 0 0 0 0 > incrementalInline 3992816 0 3043704 621832 0 0 0 0 261824... Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: revert unnecessary copyright change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23530/files - new: https://git.openjdk.org/jdk/pull/23530/files/89fce6a6..4f426160 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23530/head:pull/23530 PR: https://git.openjdk.org/jdk/pull/23530 From aph at openjdk.org Fri Feb 14 10:31:14 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 14 Feb 2025 10:31:14 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v7] In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 12:59:50 GMT, Amit Kumar wrote: >> s390x implementation for Class.isInstance intrinsic. >> >> Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. >> >> Benchmark results will be updated soon. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > space for 3 registers src/hotspot/cpu/s390/c1_Runtime1_s390.cpp line 643: > 641: __ z_stg(temp3 /*Z_R11*/, 1*BytesPerWord + frame::z_abi_160_size, Z_SP); > 642: assert(2*BytesPerWord + frame::z_abi_160_size == frame_size, "check"); > 643: I think you may be able temporarily to save R10 and R11 in the floating-point registers. You have plenty of call-clobbered FP registers, I think. This might work better than creating a stack frame. I guess it's possible to copy from an integer register to a floating-point register without altering any of the bits. There might be no performance advantage, but I think that it's worth a try. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1955915756 From nbenalla at openjdk.org Fri Feb 14 12:27:25 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Fri, 14 Feb 2025 12:27:25 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v7] In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 09:34:54 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: > > - rename excludeExtensions -> excludedExtensions > - remove redundant import/throws Thanks for the reviews and discussion, here goes! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23466#issuecomment-2659204484 From nbenalla at openjdk.org Fri Feb 14 12:27:26 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Fri, 14 Feb 2025 12:27:26 GMT Subject: Integrated: 8343802: Prevent NULL usage backsliding In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 15:39:32 GMT, Nizar Benalla wrote: > Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. > It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. > > Before adding line 86 and excluding `os_windows.cpp`, the test failed with: > > > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: > HMODULE hModule = NULL; > Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: > GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); > java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. > at TestNoNULL.main(TestNoNULL.java:73) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:565) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > at java.base/java.lang.Thread.run(Thread.java:1447) This pull request has now been integrated. Changeset: fa1bd234 Author: Nizar Benalla URL: https://git.openjdk.org/jdk/commit/fa1bd2344e60163bf247c668b94f98c50c72855a Stats: 145 lines in 2 files changed: 144 ins; 0 del; 1 mod 8343802: Prevent NULL usage backsliding Reviewed-by: kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/23466 From jsjolen at openjdk.org Fri Feb 14 12:57:18 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 14 Feb 2025 12:57:18 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v7] In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 09:34:54 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: > > - rename excludeExtensions -> excludedExtensions > - remove redundant import/throws Hi Nizar, Hotspot requires 2 approvals before integration, unless the PR is considered trivial by the reviewer. I've looked through the code and it looks good to me, so consider this a post-integration approval of the PR. All the best, Johan ------------- PR Review: https://git.openjdk.org/jdk/pull/23466#pullrequestreview-2617724468 From fyang at openjdk.org Fri Feb 14 13:29:48 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Feb 2025 13:29:48 GMT Subject: RFR: 8350093: RISC-V: java/math/BigInteger/LargeValueExceptions.java timeout with COH Message-ID: <20dDUIvvN45lILuiZ1hdaOXnRDTNl1V2nKI4X1S1lPE=.f615a14e-0518-4177-ac47-d9b8fd222d2b@github.com> Hi, Please review this change resolving a timeout issue in `LargeValueExceptions.squareDefiniteOverflow()`. This issue only happens on platforms with slow unaligned memory accesses like Unmatched or Premier-P550 SBCs. Async profiler shows major time was spent in multiplyToLen stub code. When AvoidUnalignedAccesses is enabled, there is a simple alignment check, which assumes 8-byte alignment for base_offset of int arrays. But this is not the case with COH: base_offset is 12 bytes instead of 16 bytes for int arrays. Patch simply makes it explicit about the requirement of base_offset. Sanity tested on Premier P550. No obvious change witnessed on JMH after this change: ----------------------------------------------------------------------------------------------- Without COH: Benchmark (maxNumbits) Mode Cnt Score Error Units BigIntegers.SmallShifts.testLeftShift 32 avgt 15 138.939 ? 2.246 ns/op BigIntegers.SmallShifts.testLeftShift 128 avgt 15 88.391 ? 1.210 ns/op BigIntegers.SmallShifts.testLeftShift 256 avgt 15 117.590 ? 1.398 ns/op BigIntegers.SmallShifts.testRightShift 32 avgt 15 150.338 ? 1.961 ns/op BigIntegers.SmallShifts.testRightShift 128 avgt 15 104.540 ? 5.636 ns/op BigIntegers.SmallShifts.testRightShift 256 avgt 15 126.082 ? 1.756 ns/op BigIntegers.testAdd N/A avgt 15 97.513 ? 40.746 ns/op BigIntegers.testGcd N/A avgt 15 5409222.706 ? 5934.667 ns/op BigIntegers.testHugeLargeDivide N/A avgt 15 246.904 ? 1.552 ns/op BigIntegers.testHugeSmallDivide N/A avgt 15 248.997 ? 1.374 ns/op BigIntegers.testHugeToString N/A avgt 15 2421.432 ? 62.208 ns/op BigIntegers.testLargeSmallDivide N/A avgt 15 216.859 ? 1.760 ns/op BigIntegers.testLargeToString N/A avgt 15 425.653 ? 13.305 ns/op BigIntegers.testLeftShift N/A avgt 15 2265.137 ? 24.319 ns/op BigIntegers.testMultiply N/A avgt 15 15862.412 ? 417.880 ns/op <======== BigIntegers.testRightShift N/A avgt 15 936.071 ? 15.247 ns/op BigIntegers.testSmallToString N/A avgt 15 322.350 ? 16.075 ns/op ----------------------------------------------------------------------------------------------- With COH: Benchmark (maxNumbits) Mode Cnt Score Error Units BigIntegers.SmallShifts.testLeftShift 32 avgt 15 117.991 ? 1.306 ns/op BigIntegers.SmallShifts.testLeftShift 128 avgt 15 150.202 ? 0.922 ns/op BigIntegers.SmallShifts.testLeftShift 256 avgt 15 105.895 ? 0.779 ns/op BigIntegers.SmallShifts.testRightShift 32 avgt 15 127.582 ? 1.765 ns/op BigIntegers.SmallShifts.testRightShift 128 avgt 15 171.976 ? 0.611 ns/op BigIntegers.SmallShifts.testRightShift 256 avgt 15 118.938 ? 2.882 ns/op BigIntegers.testAdd N/A avgt 15 73.390 ? 1.368 ns/op BigIntegers.testGcd N/A avgt 15 5409885.951 ? 11493.243 ns/op BigIntegers.testHugeLargeDivide N/A avgt 15 135.854 ? 38.272 ns/op BigIntegers.testHugeSmallDivide N/A avgt 15 130.308 ? 24.959 ns/op BigIntegers.testHugeToString N/A avgt 15 2525.327 ? 8.116 ns/op BigIntegers.testLargeSmallDivide N/A avgt 15 158.357 ? 0.676 ns/op BigIntegers.testLargeToString N/A avgt 15 333.591 ? 102.890 ns/op BigIntegers.testLeftShift N/A avgt 15 2283.509 ? 22.843 ns/op BigIntegers.testMultiply N/A avgt 15 15635.504 ? 414.156 ns/op <======== BigIntegers.testRightShift N/A avgt 15 927.502 ? 26.892 ns/op BigIntegers.testSmallToString N/A avgt 15 313.170 ? 6.514 ns/op Finished running test 'micro:java.math.BigIntegers' ------------- Commit messages: - 8350093: RISC-V: java/math/BigInteger/LargeValueExceptions.java timeout with COH Changes: https://git.openjdk.org/jdk/pull/23631/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23631&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350093 Stats: 17 lines in 1 file changed: 11 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23631.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23631/head:pull/23631 PR: https://git.openjdk.org/jdk/pull/23631 From stefank at openjdk.org Fri Feb 14 13:54:12 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 14 Feb 2025 13:54:12 GMT Subject: RFR: 8349652: Rewire nmethod oop load barriers In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 09:57:15 GMT, Stefan Karlsson wrote: > When loading oops from nmethods we current use the Access API to inject load barriers for the GCs that requires them. As part of the ZGC load barrier we need access to the nmethod to properly perform the load barrier. The current implementation of the Access API doesn't support passing down the nmethod through all its layers of code so ZGC asks the code cache what nmethod the various oops belongs to. There's currently an open PR for JDK-8343789 (#21276), which moves the oops out of the code cache, so the current way ZGC implementation will not work after that has been integrated. > > The proposal is to figure out a way to explicitly pass down the nmethod to the load barriers. > > We could extend the Access API to pass down the nmethod through all its various layers. The drawback of that is that it adds a lot of boiler plate code and requires new over loads and/or names. Given that this isn't performance critical code I propose that we take the much simpler route and call straight to the BarrierSetNMethod class. > > Given that MMethodAccess and IN_NMETHOD were only introduced to support nmethod oop loads for ZGC and are note used anymore I've also removed them from the code. > > Tested with reproducer for the ZGC issue in JDK-8343789, tier1-7 Linux with ZGC tasks, currently running tier1-3. @fisk @xmas92 could one of you review this patch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23512#issuecomment-2659391915 From rrich at openjdk.org Fri Feb 14 14:25:12 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 14 Feb 2025 14:25:12 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v2] In-Reply-To: <_jQq0sGaZijMT6Cr3rUdrQLlvVWNuJ8uILHg1qkCxoM=.271ce257-9e72-4d61-87fa-588ce4dbe107@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> <_jQq0sGaZijMT6Cr3rUdrQLlvVWNuJ8uILHg1qkCxoM=.271ce257-9e72-4d61-87fa-588ce4dbe107@github.com> Message-ID: On Thu, 13 Feb 2025 21:32:54 GMT, Dean Long wrote: > > The 2nd assert does not fail w/o the deoptimization.cpp fix. Might be due to alignement of caller->sp() in the interpreter. > > Aarch64 also does alignment, and that's why the test uses two different methods, one with an extra local, to hopefully handle both cases of even/odd 2-word (16 byte) alignment. But ppc might be different enough that this isn't enough to trigger the bug. Or maybe the end of frame bound is slightly off? I think you can make the assertion a little stricter like this https://github.com/reinrich/jdk/commit/9c3c8a33a29b9ae6c4c703992b306dc0cbbcd2f0. The test still doesn't fail on ppc64 w/o the fix. This is because the deoptee's caller is alwys enlarged [here](https://github.com/openjdk/jdk/blob/57f4c30fb6be1da57c8fcc742b5c36d842eef397/src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp#L2840) although it's only necessary if it is the entry frame or compiled. (Reasoning for the stricter assertion: interpreter frames on top of stack have a `frame::top_ijava_frame_abi` just above sp needed for VM calls. When a call is received by the interpreter, it trimms the abi of the caller back to `frame::parent_ijava_frame_abi`. An i2c adapter does not do this.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2659465735 From shade at openjdk.org Fri Feb 14 14:52:41 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Feb 2025 14:52:41 GMT Subject: RFR: 8350086: Inline hot Method accessors for faster task selection Message-ID: <2S92WMb5nqAG6LoBfpEmYf-0UubJpCAZ3XDUg2bKRos=.27a8beae-78b1-4934-84fd-f13cbad105f4@github.com> These methods show up prominently on Leyden profiles, as compilation policy asks these properties for methods very often during compile task selection: - `Method::invocation_count` - `Method::backedge_count` - `Method::highest_comp_level` We can move the definitions for these methods to method.inline.hpp to make them eligible for better inlining. `interpreter_invocation_count()` method is a bit weird, looks like a leftover from [JDK-8251462](https://bugs.openjdk.org/browse/JDK-8251462). Removing it would prompt more cleanups and renamings in `ciMethod`, so I would leave it for future enhancement. Additional testing: - [x] Spot-checked Leyden profiles, methods are now fully inlined into hot `CompilerBroker` methods - [x] Ad-hoc Leyden benchmarks show minor improvements (< 1%) for time spent in compiler threads ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/23634/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23634&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350086 Stats: 76 lines in 3 files changed: 39 ins; 33 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23634/head:pull/23634 PR: https://git.openjdk.org/jdk/pull/23634 From mli at openjdk.org Fri Feb 14 14:53:13 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 14 Feb 2025 14:53:13 GMT Subject: RFR: 8350093: RISC-V: java/math/BigInteger/LargeValueExceptions.java timeout with COH In-Reply-To: <20dDUIvvN45lILuiZ1hdaOXnRDTNl1V2nKI4X1S1lPE=.f615a14e-0518-4177-ac47-d9b8fd222d2b@github.com> References: <20dDUIvvN45lILuiZ1hdaOXnRDTNl1V2nKI4X1S1lPE=.f615a14e-0518-4177-ac47-d9b8fd222d2b@github.com> Message-ID: On Fri, 14 Feb 2025 13:21:58 GMT, Fei Yang wrote: > Hi, please review this change resolving a timeout issue in `LargeValueExceptions.squareDefiniteOverflow()`. > > This issue only happens on platforms with slow unaligned memory accesses like Unmatched or Premier-P550 SBCs. > Async profiler shows major time was spent in multiplyToLen stub code. When AvoidUnalignedAccesses is enabled, > there is a simple alignment check, which assumes 8-byte alignment for base_offset of int arrays. But this is > not the case with COH: base_offset is 12 bytes instead of 16 bytes for int arrays. > > Patch simply makes it explicit about the requirement of base_offset. Sanity tested on Premier P550. > No obvious change witnessed on JMH after this change: > > ----------------------------------------------------------------------------------------------- > > Without COH: > > Benchmark (maxNumbits) Mode Cnt Score Error Units > BigIntegers.SmallShifts.testLeftShift 32 avgt 15 138.939 ? 2.246 ns/op > BigIntegers.SmallShifts.testLeftShift 128 avgt 15 88.391 ? 1.210 ns/op > BigIntegers.SmallShifts.testLeftShift 256 avgt 15 117.590 ? 1.398 ns/op > BigIntegers.SmallShifts.testRightShift 32 avgt 15 150.338 ? 1.961 ns/op > BigIntegers.SmallShifts.testRightShift 128 avgt 15 104.540 ? 5.636 ns/op > BigIntegers.SmallShifts.testRightShift 256 avgt 15 126.082 ? 1.756 ns/op > BigIntegers.testAdd N/A avgt 15 97.513 ? 40.746 ns/op > BigIntegers.testGcd N/A avgt 15 5409222.706 ? 5934.667 ns/op > BigIntegers.testHugeLargeDivide N/A avgt 15 246.904 ? 1.552 ns/op > BigIntegers.testHugeSmallDivide N/A avgt 15 248.997 ? 1.374 ns/op > BigIntegers.testHugeToString N/A avgt 15 2421.432 ? 62.208 ns/op > BigIntegers.testLargeSmallDivide N/A avgt 15 216.859 ? 1.760 ns/op > BigIntegers.testLargeToString N/A avgt 15 425.653 ? 13.305 ns/op > BigIntegers.testLeftShift N/A avgt 15 2265.137 ? 24.319 ns/op > BigIntegers.testMultiply N/A avgt 15 15862.412 ? 417.880 ns/op <======== > BigIntegers.testRightShift N/A avgt 15 936.071 ? 15.247 ns/op > BigIntegers.testSmallToString N/A avgt 15 322.350 ? 16.075... Nice catch. Have one question. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 5486: > 5484: const Register jdx = tmp1; > 5485: > 5486: if (AvoidUnalignedAccesses) { If `AvoidUnalignedAccesses == false`, it will go through all alignment code? But seems original code will not go through this alignment when `AvoidUnalignedAccesses == false`. ------------- PR Review: https://git.openjdk.org/jdk/pull/23631#pullrequestreview-2618011364 PR Review Comment: https://git.openjdk.org/jdk/pull/23631#discussion_r1956270917 From amitkumar at openjdk.org Fri Feb 14 14:57:25 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 14 Feb 2025 14:57:25 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v8] In-Reply-To: References: Message-ID: > s390x implementation for Class.isInstance intrinsic. > > Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. > > Benchmark results will be updated soon. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: remove frame requirement ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23535/files - new: https://git.openjdk.org/jdk/pull/23535/files/db593594..180e9f33 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=06-07 Stats: 13 lines in 1 file changed: 0 ins; 8 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23535/head:pull/23535 PR: https://git.openjdk.org/jdk/pull/23535 From amitkumar at openjdk.org Fri Feb 14 14:57:26 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 14 Feb 2025 14:57:26 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v7] In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 12:59:50 GMT, Amit Kumar wrote: >> s390x implementation for Class.isInstance intrinsic. >> >> Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. >> >> Benchmark results will be updated soon. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > space for 3 registers New benchmark result: Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 4.271 ? 0.034 ns/op SecondarySupersLookup.testNegative01 avgt 15 4.270 ? 0.048 ns/op SecondarySupersLookup.testNegative02 avgt 15 4.263 ? 0.019 ns/op SecondarySupersLookup.testNegative03 avgt 15 4.266 ? 0.023 ns/op SecondarySupersLookup.testNegative04 avgt 15 4.274 ? 0.030 ns/op SecondarySupersLookup.testNegative05 avgt 15 4.268 ? 0.019 ns/op SecondarySupersLookup.testNegative06 avgt 15 4.269 ? 0.022 ns/op SecondarySupersLookup.testNegative07 avgt 15 4.280 ? 0.027 ns/op SecondarySupersLookup.testNegative08 avgt 15 4.274 ? 0.030 ns/op SecondarySupersLookup.testNegative09 avgt 15 4.258 ? 0.012 ns/op SecondarySupersLookup.testNegative10 avgt 15 4.266 ? 0.023 ns/op SecondarySupersLookup.testNegative16 avgt 15 4.257 ? 0.010 ns/op SecondarySupersLookup.testNegative20 avgt 15 4.258 ? 0.011 ns/op SecondarySupersLookup.testNegative30 avgt 15 4.260 ? 0.019 ns/op SecondarySupersLookup.testNegative32 avgt 15 4.263 ? 0.024 ns/op SecondarySupersLookup.testNegative40 avgt 15 4.260 ? 0.013 ns/op SecondarySupersLookup.testNegative50 avgt 15 4.266 ? 0.024 ns/op SecondarySupersLookup.testNegative55 avgt 15 28.628 ? 2.120 ns/op SecondarySupersLookup.testNegative56 avgt 15 28.561 ? 0.477 ns/op SecondarySupersLookup.testNegative57 avgt 15 30.626 ? 3.137 ns/op SecondarySupersLookup.testNegative58 avgt 15 29.328 ? 0.528 ns/op SecondarySupersLookup.testNegative59 avgt 15 32.580 ? 4.115 ns/op SecondarySupersLookup.testNegative60 avgt 15 32.745 ? 3.782 ns/op SecondarySupersLookup.testNegative61 avgt 15 33.227 ? 3.922 ns/op SecondarySupersLookup.testNegative62 avgt 15 33.354 ? 3.655 ns/op SecondarySupersLookup.testNegative63 avgt 15 35.595 ? 3.865 ns/op SecondarySupersLookup.testNegative64 avgt 15 34.268 ? 3.374 ns/op SecondarySupersLookup.testPositive01 avgt 15 4.800 ? 0.010 ns/op SecondarySupersLookup.testPositive02 avgt 15 4.803 ? 0.017 ns/op SecondarySupersLookup.testPositive03 avgt 15 4.799 ? 0.012 ns/op SecondarySupersLookup.testPositive04 avgt 15 4.799 ? 0.012 ns/op SecondarySupersLookup.testPositive05 avgt 15 4.797 ? 0.007 ns/op SecondarySupersLookup.testPositive06 avgt 15 4.798 ? 0.013 ns/op SecondarySupersLookup.testPositive07 avgt 15 4.803 ? 0.015 ns/op SecondarySupersLookup.testPositive08 avgt 15 5.483 ? 1.516 ns/op SecondarySupersLookup.testPositive09 avgt 15 4.797 ? 0.007 ns/op SecondarySupersLookup.testPositive10 avgt 15 4.798 ? 0.009 ns/op SecondarySupersLookup.testPositive16 avgt 15 4.798 ? 0.008 ns/op SecondarySupersLookup.testPositive20 avgt 15 4.800 ? 0.015 ns/op SecondarySupersLookup.testPositive30 avgt 15 4.798 ? 0.009 ns/op SecondarySupersLookup.testPositive32 avgt 15 4.799 ? 0.012 ns/op SecondarySupersLookup.testPositive40 avgt 15 15.446 ? 0.125 ns/op SecondarySupersLookup.testPositive50 avgt 15 4.797 ? 0.009 ns/op SecondarySupersLookup.testPositive60 avgt 15 28.643 ? 3.308 ns/op SecondarySupersLookup.testPositive63 avgt 15 27.370 ? 2.537 ns/op SecondarySupersLookup.testPositive64 avgt 15 33.219 ? 3.552 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' ------------- PR Comment: https://git.openjdk.org/jdk/pull/23535#issuecomment-2659536641 From amitkumar at openjdk.org Fri Feb 14 14:57:26 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 14 Feb 2025 14:57:26 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v7] In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 10:28:34 GMT, Andrew Haley wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> space for 3 registers > > src/hotspot/cpu/s390/c1_Runtime1_s390.cpp line 643: > >> 641: __ z_stg(temp3 /*Z_R11*/, 1*BytesPerWord + frame::z_abi_160_size, Z_SP); >> 642: assert(2*BytesPerWord + frame::z_abi_160_size == frame_size, "check"); >> 643: > > I think you may be able temporarily to save R10 and R11 in the floating-point registers. You have plenty of call-clobbered FP registers, I think. This might work better than creating a stack frame. I guess it's possible to copy from an integer register to a floating-point register without altering any of the bits. There might be no performance advantage, but I think that it's worth a try. I see some further improvements. I will post the result down. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1956275664 From shade at openjdk.org Fri Feb 14 15:05:46 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Feb 2025 15:05:46 GMT Subject: RFR: 8350086: Inline hot Method accessors for faster task selection [v2] In-Reply-To: <2S92WMb5nqAG6LoBfpEmYf-0UubJpCAZ3XDUg2bKRos=.27a8beae-78b1-4934-84fd-f13cbad105f4@github.com> References: <2S92WMb5nqAG6LoBfpEmYf-0UubJpCAZ3XDUg2bKRos=.27a8beae-78b1-4934-84fd-f13cbad105f4@github.com> Message-ID: > These methods show up prominently on Leyden profiles, as compilation policy asks these properties for methods very often during compile task selection: > - `Method::invocation_count` > - `Method::backedge_count` > - `Method::highest_comp_level` > > We can move the definitions for these methods to method.inline.hpp to make them eligible for better inlining. > > `interpreter_invocation_count()` method is a bit weird, looks like a leftover from [JDK-8251462](https://bugs.openjdk.org/browse/JDK-8251462). Removing it would prompt more cleanups and renamings in `ciMethod`, so I would leave it for future enhancement. > > Additional testing: > - [x] Spot-checked Leyden profiles, methods are now fully inlined into hot `CompilerBroker` methods > - [x] Ad-hoc Leyden benchmarks show minor improvements (< 1%) for time spent in compiler threads Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Adjust includes to match the move ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23634/files - new: https://git.openjdk.org/jdk/pull/23634/files/845dc7d5..8f6d4fab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23634&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23634&range=00-01 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23634/head:pull/23634 PR: https://git.openjdk.org/jdk/pull/23634 From shade at openjdk.org Fri Feb 14 15:12:50 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Feb 2025 15:12:50 GMT Subject: RFR: 8350086: Inline hot Method accessors for faster task selection [v3] In-Reply-To: <2S92WMb5nqAG6LoBfpEmYf-0UubJpCAZ3XDUg2bKRos=.27a8beae-78b1-4934-84fd-f13cbad105f4@github.com> References: <2S92WMb5nqAG6LoBfpEmYf-0UubJpCAZ3XDUg2bKRos=.27a8beae-78b1-4934-84fd-f13cbad105f4@github.com> Message-ID: > These methods show up prominently on Leyden profiles, as compilation policy asks these properties for methods very often during compile task selection: > - `Method::invocation_count` > - `Method::backedge_count` > - `Method::highest_comp_level` > > We can move the definitions for these methods to method.inline.hpp to make them eligible for better inlining. > > `interpreter_invocation_count()` method is a bit weird, looks like a leftover from [JDK-8251462](https://bugs.openjdk.org/browse/JDK-8251462). Removing it would prompt more cleanups and renamings in `ciMethod`, so I would leave it for future enhancement. > > Additional testing: > - [x] Spot-checked Leyden profiles, methods are now fully inlined into hot `CompilerBroker` methods > - [x] Ad-hoc Leyden benchmarks show minor improvements (< 1%) for time spent in compiler threads Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - One more include for Minimal VM - Turn declarations proper inline ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23634/files - new: https://git.openjdk.org/jdk/pull/23634/files/8f6d4fab..24be8e68 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23634&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23634&range=01-02 Stats: 5 lines in 2 files changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23634/head:pull/23634 PR: https://git.openjdk.org/jdk/pull/23634 From gziemski at openjdk.org Fri Feb 14 15:14:36 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 14 Feb 2025 15:14:36 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v40] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: cleanup fprintf formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/39886985..40f91642 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=39 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=38-39 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From gziemski at openjdk.org Fri Feb 14 15:59:09 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 14 Feb 2025 15:59:09 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v41] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: cleanup type mismatches ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/40f91642..afe49659 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=40 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=39-40 Stats: 36 lines in 1 file changed: 3 ins; 0 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From fyang at openjdk.org Fri Feb 14 16:03:10 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Feb 2025 16:03:10 GMT Subject: RFR: 8350093: RISC-V: java/math/BigInteger/LargeValueExceptions.java timeout with COH In-Reply-To: References: <20dDUIvvN45lILuiZ1hdaOXnRDTNl1V2nKI4X1S1lPE=.f615a14e-0518-4177-ac47-d9b8fd222d2b@github.com> Message-ID: On Fri, 14 Feb 2025 14:50:14 GMT, Hamlin Li wrote: >> Hi, please review this change resolving a timeout issue in `LargeValueExceptions.squareDefiniteOverflow()`. >> >> This issue only happens on platforms with slow unaligned memory accesses like Unmatched or Premier-P550 SBCs. >> Async profiler shows major time was spent in multiplyToLen stub code. When AvoidUnalignedAccesses is enabled, >> there is a simple alignment check, which assumes 8-byte alignment for base_offset of int arrays. But this is >> not the case with COH: base_offset is 12 bytes instead of 16 bytes for int arrays. >> >> Patch simply makes it explicit about the requirement of base_offset. Sanity tested on Premier P550. >> No obvious change witnessed on JMH after this change: >> >> ----------------------------------------------------------------------------------------------- >> >> Without COH: >> >> Benchmark (maxNumbits) Mode Cnt Score Error Units >> BigIntegers.SmallShifts.testLeftShift 32 avgt 15 138.939 ? 2.246 ns/op >> BigIntegers.SmallShifts.testLeftShift 128 avgt 15 88.391 ? 1.210 ns/op >> BigIntegers.SmallShifts.testLeftShift 256 avgt 15 117.590 ? 1.398 ns/op >> BigIntegers.SmallShifts.testRightShift 32 avgt 15 150.338 ? 1.961 ns/op >> BigIntegers.SmallShifts.testRightShift 128 avgt 15 104.540 ? 5.636 ns/op >> BigIntegers.SmallShifts.testRightShift 256 avgt 15 126.082 ? 1.756 ns/op >> BigIntegers.testAdd N/A avgt 15 97.513 ? 40.746 ns/op >> BigIntegers.testGcd N/A avgt 15 5409222.706 ? 5934.667 ns/op >> BigIntegers.testHugeLargeDivide N/A avgt 15 246.904 ? 1.552 ns/op >> BigIntegers.testHugeSmallDivide N/A avgt 15 248.997 ? 1.374 ns/op >> BigIntegers.testHugeToString N/A avgt 15 2421.432 ? 62.208 ns/op >> BigIntegers.testLargeSmallDivide N/A avgt 15 216.859 ? 1.760 ns/op >> BigIntegers.testLargeToString N/A avgt 15 425.653 ? 13.305 ns/op >> BigIntegers.testLeftShift N/A avgt 15 2265.137 ? 24.319 ns/op >> BigIntegers.testMultiply N/A avgt 15 15862.412 ? 417.880 ns/op <======== >> BigIntegers.testRightShift N/A avgt 15 936.071 ? 15.247 ns/op >> BigIntegers.testSmallTo... > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 5486: > >> 5484: const Register jdx = tmp1; >> 5485: >> 5486: if (AvoidUnalignedAccesses) { > > If `AvoidUnalignedAccesses == false`, it will go through all alignment code? But seems original code will not go through this alignment when `AvoidUnalignedAccesses == false`. Hi, not sure if I understand the question correctly. This only affects platforms where `AvoidUnalignedAccesses` is true. It does not make a difference on platforms with fast misaligned accesses (which means `AvoidUnalignedAccesses == false`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23631#discussion_r1956379043 From gziemski at openjdk.org Fri Feb 14 16:50:59 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 14 Feb 2025 16:50:59 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v42] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: suppres Win build errors ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/afe49659..4cac7c88 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=41 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=40-41 Stats: 32 lines in 2 files changed: 21 ins; 2 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From roland at openjdk.org Fri Feb 14 16:55:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 14 Feb 2025 16:55:14 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> Message-ID: On Thu, 13 Feb 2025 16:43:22 GMT, Roland Westrelin wrote: >> @galderz How sure are that intrinsifying directly is really the right approach? >> >> Maybe the approach via `PhaseIdealLoop::conditional_move` where we know the branching probability is a better one. Though of course knowing the branching probability is no perfect heuristic for how good branch prediction is going to be, but it is at least something. >> >> So I'm wondering if there could be a different approach that sees all the wins you get here, without any of the regressions? >> >> If we are just interested in better vectorization: the current issue is that the auto-vectorizer cannot handle CFG, i.e. we do not yet do if-conversion. But if we had if-conversion, then the inlined CFG of min/max would just be converted to vector CMove (or vector min/max where available) at that point. We can take the branching probabilities into account, just like `PhaseIdealLoop::conditional_move` does - if that is necessary. Of course if-conversion is far away, and we will encounter a lot of issues with branch prediction etc, so I'm scared we might never get there - but I want to try ;) >> >> Do we see any other wins with your patch, that are not due to vectorization, but just scalar code? > >> Do we see any other wins with your patch, that are not due to vectorization, but just scalar code? > > I think there are some. > > The current transformation from the parsed version of min/max to a conditional move to a `Max`/`Min` node depends on the conditional move transformation which has its own set of heuristics and while it happens on simple test cases, that's not necessarily the case on all code shapes. I don't think we want to trust it too much. > > With the intrinsic, the type of the min or max can be narrowed down in a way it can't be whether the code includes control flow or a conditional move. That in turn, once types have propagated, could cause some constant to appear and could be a significant win. > > The `Min`/`Max` nodes are floating nodes. They can hoist out of loop and common reliably in ways that are not guaranteed otherwise. > @rwestrel What do you think about the regressions in the scalar cases of this patch? Shouldn't int `min`/`max` be affected the same way? I suppose extracting the branch probability from the `MethodData` and attaching it to the `Min`/`Max` nodes is not impossible. I did something like that in the `ScopedValue` PR that you reviewed (and was put on hold). Now, that would be quite a bit of extra complexity for what feels like a corner case. Another possibility would be to implement `CMove` with branches (https://bugs.openjdk.org/browse/JDK-8340206) or to move the implementation of `MinL`/`MovL` in the ad files and experiment with branches there. It seems overall, we likely win more than we loose with this intrinsic, so I would integrate this change as it is and file a bug to keep track of remaining issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2659821025 From gziemski at openjdk.org Fri Feb 14 17:08:00 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 14 Feb 2025 17:08:00 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v43] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: fix Win build error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/4cac7c88..6e067a1a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=42 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=41-42 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From kvn at openjdk.org Fri Feb 14 17:56:11 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 14 Feb 2025 17:56:11 GMT Subject: RFR: 8350086: Inline hot Method accessors for faster task selection [v3] In-Reply-To: References: <2S92WMb5nqAG6LoBfpEmYf-0UubJpCAZ3XDUg2bKRos=.27a8beae-78b1-4934-84fd-f13cbad105f4@github.com> Message-ID: On Fri, 14 Feb 2025 15:12:50 GMT, Aleksey Shipilev wrote: >> These methods show up prominently on Leyden profiles, as compilation policy asks these properties for methods very often during compile task selection: >> - `Method::invocation_count` >> - `Method::backedge_count` >> - `Method::highest_comp_level` >> >> We can move the definitions for these methods to method.inline.hpp to make them eligible for better inlining. >> >> `interpreter_invocation_count()` method is a bit weird, looks like a leftover from [JDK-8251462](https://bugs.openjdk.org/browse/JDK-8251462). Removing it would prompt more cleanups and renamings in `ciMethod`, so I would leave it for future enhancement. >> >> Additional testing: >> - [x] Spot-checked Leyden profiles, methods are now fully inlined into hot `CompilerBroker` methods >> - [x] Ad-hoc Leyden benchmarks show minor improvements (< 1%) for time spent in compiler threads > > Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: > > - One more include for Minimal VM > - Turn declarations proper inline Okay. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23634#pullrequestreview-2618466919 From aph-open at littlepinkcloud.com Fri Feb 14 18:01:23 2025 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Fri, 14 Feb 2025 18:01:23 +0000 Subject: RFD: Subsampled profile counters in HotSpot Message-ID: <0bbafb6e-c9f9-4c16-a278-068b5082c3e2@littlepinkcloud.com> This is JDK-8134940: TieredCompilation profiling can exhibit poor scalability. (Thanks to Igor Veresov for the inspiration and advice!) The poster child for awful profile scaling is SOR, https://cr.openjdk.org/~redestad/scratch/sor-jmh/ Here is the TieredStopAtLevel=3 performance with just one thread: Benchmark Mode Cnt Score Error Units JGSOR.test avgt 3 9.144 ? 1.044 ms/op and with 16 hardware threads: JGSOR.test avgt 3 1177.982 ? 5.108 ms/op So it's a 100-fold performance drop. This is a real problem I've seen in production deployments. I've been looking at the idea of incrementing profile counters less frequently, recording events in a pseudo-random subsampled way. For example, you could set the ProfileCaptureRatio=16 and then, at random, only 1/16th of counter updates would be recorded. In theory, those 1/16th of updates should be representative, but theory is not necessarily the same as practice. Here's where Statistics comes to our rescue, though. I am not a statistician, but I think the central theorem of statistics applies. It "describes the asymptotic behaviour of the empirical distribution function as the number of independent and identically distributed observations grows. Specifically, the empirical distribution function converges uniformly to the true distribution function almost surely." (It doesn't say how long that'll take, though.) So, as long as the random-number generator we use is completely uncorrelated with the process we're profiling, even a substantially undersampled set of profile counters should converge to the same ratios we'd have if not undersampling. The poster child for awful profile scaling is SOR, https://cr.openjdk.org/~redestad/scratch/sor-jmh/ Here is the TieredStopAtLevel=3 performance with just one thread: Benchmark Mode Cnt Score Error Units JGSOR.test avgt 3 9.144 ? 1.044 ms/op and with 16 hardware threads: JGSOR.test avgt 3 1177.982 ? 5.108 ms/op So it's a 100-fold performance drop. I have done a very rough proof-of-concept implementation of subsampling. It's at https://github.com/openjdk/jdk/pull/23643/files It's not fit for anything much except to demonstrate the feasibility of using this approach. While the code isn't great, I think that it does fairly represent the performance we could expect if we decided to go with this approach. These are the JMH results for SOR, 16 threads with various subsampling ratios, controlled by -XX:ProfileCaptureRatio=n: n Benchmark Mode Cnt Score Error Units 1 JGSOR.test avgt 3 1177.982 ? 5.108 ms/op 2 JGSOR.test avgt 3 622.435 ? 101.466 ms/op 4 JGSOR.test avgt 3 310.496 ? 17.681 ms/op 8 JGSOR.test avgt 3 170.867 ? 0.911 ms/op 16 JGSOR.test avgt 3 98.210 ? 9.236 ms/op 32 JGSOR.test avgt 3 58.137 ? 3.501 ms/op 64 JGSOR.test avgt 3 35.384 ? 0.922 ms/op 128 JGSOR.test avgt 3 22.076 ? 0.197 ms/op 256 JGSOR.test avgt 3 15.459 ? 2.312 ms/op 1024 JGSOR.test avgt 3 10.180 ? 0.426 ms/op With n=1. there is no undersampling at all, and we see the catastrophic slowdown which is the subject of this bug report. The performance improves rapidly, but not quite linearly, with increasing subsampling ratios, as you'd expect. /build/linux-x86_64-server-release/jdk/bin/java -jar ./build/linux-x86_64-server-release/images/test/micro/benchmarks.jar SOR -t 16 -f 1 -wi 3 -i 3 -r 1 -w 1 -jvmArgs ' -XX:TieredStopAtLevel=3 -XX:ProfileCaptureRatio=16' Surprisingly, the overhead for randomized subsapling isn't so great. Here's the speed of the same JGSOR.test with only 1 thread, with various subsampling ratios: 1 JGSOR.test avgt 3 9.087 ? 0.041 ms/op (not undersampled) 2 JGSOR.test avgt 3 22.431 ? 0.079 ms/op 4 JGSOR.test avgt 3 14.291 ? 0.048 ms/op 8 JGSOR.test avgt 3 10.316 ? 0.021 ms/op 16 JGSOR.test avgt 3 9.360 ? 0.022 ms/op 32 JGSOR.test avgt 3 9.196 ? 0.042 ms/op We can see that if we undersample 16-fold, then the single-threaded overhead for profile counting is no worse than it is with no undersampling at all. We could, at least in theory, ship HotSpot with 32-fold undersampling as a default, and no one would ever notice, except that the poor scaling behaviour would be very much reduced. However, there is a cost. C1 code size does increase because we have to step the random-number generator at every profiling site. It's not too bad, because the added code is just something like mov ebx,0x41c64e6d imul r14d,ebx add r14d,0x3039 cmp r14d,0x1000000 // ProfileCaptureRatio jae 0x00007fffe099c197 ... profiling code but it's definitely bigger. The POC is C1 only, x86 only, and I haven't done anything about profiling the interpreter. I'm sure it has bugs. It'll probably crash if you push it too hard. But is very representative, I think, of how well a finished implementation would perform. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From gziemski at openjdk.org Fri Feb 14 18:05:55 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 14 Feb 2025 18:05:55 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v44] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: cleanup Win build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/6e067a1a..fb176588 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=43 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=42-43 Stats: 32 lines in 1 file changed: 9 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From kvn at openjdk.org Fri Feb 14 18:06:12 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 14 Feb 2025 18:06:12 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v4] In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 18:31:42 GMT, Ioi Lam wrote: >> Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. >> >> With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. >> >> To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): >> >>> the format of the configuration and cache files is not specified and is subject to change without notice. >> >> **Notes for reviewers:** >> >> - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. >> - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. >> - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. >> - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. >> - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. >> >> **Misc Note** >> - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccb... > > Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: > > - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself > - Improve error message when AOTMode=create has an incompatible classpath Looks good to me. You need second review (from @calvinccheung ?) ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23484#pullrequestreview-2618485003 From coleenp at openjdk.org Fri Feb 14 18:38:18 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 14 Feb 2025 18:38:18 GMT Subject: RFR: 8350086: Inline hot Method accessors for faster task selection [v3] In-Reply-To: References: <2S92WMb5nqAG6LoBfpEmYf-0UubJpCAZ3XDUg2bKRos=.27a8beae-78b1-4934-84fd-f13cbad105f4@github.com> Message-ID: On Fri, 14 Feb 2025 15:12:50 GMT, Aleksey Shipilev wrote: >> These methods show up prominently on Leyden profiles, as compilation policy asks these properties for methods very often during compile task selection: >> - `Method::invocation_count` >> - `Method::backedge_count` >> - `Method::highest_comp_level` >> >> We can move the definitions for these methods to method.inline.hpp to make them eligible for better inlining. >> >> `interpreter_invocation_count()` method is a bit weird, looks like a leftover from [JDK-8251462](https://bugs.openjdk.org/browse/JDK-8251462). Removing it would prompt more cleanups and renamings in `ciMethod`, so I would leave it for future enhancement. >> >> Additional testing: >> - [x] Spot-checked Leyden profiles, methods are now fully inlined into hot `CompilerBroker` methods >> - [x] Ad-hoc Leyden benchmarks show minor improvements (< 1%) for time spent in compiler threads > > Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: > > - One more include for Minimal VM > - Turn declarations proper inline This seems fine but at one point we were talking about moving what looks like the duplicated InvocationCounters from MethodCounters and use MDO instead. I think this looks like it could be something to clean up. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23634#pullrequestreview-2618549865 From aph at openjdk.org Fri Feb 14 18:46:16 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 14 Feb 2025 18:46:16 GMT Subject: RFR: 8350086: Inline hot Method accessors for faster task selection [v3] In-Reply-To: References: <2S92WMb5nqAG6LoBfpEmYf-0UubJpCAZ3XDUg2bKRos=.27a8beae-78b1-4934-84fd-f13cbad105f4@github.com> Message-ID: On Fri, 14 Feb 2025 15:12:50 GMT, Aleksey Shipilev wrote: >> These methods show up prominently on Leyden profiles, as compilation policy asks these properties for methods very often during compile task selection: >> - `Method::invocation_count` >> - `Method::backedge_count` >> - `Method::highest_comp_level` >> >> We can move the definitions for these methods to method.inline.hpp to make them eligible for better inlining. >> >> `interpreter_invocation_count()` method is a bit weird, looks like a leftover from [JDK-8251462](https://bugs.openjdk.org/browse/JDK-8251462). Removing it would prompt more cleanups and renamings in `ciMethod`, so I would leave it for future enhancement. >> >> Additional testing: >> - [x] Spot-checked Leyden profiles, methods are now fully inlined into hot `CompilerBroker` methods >> - [x] Ad-hoc Leyden benchmarks show minor improvements (< 1%) for time spent in compiler threads > > Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: > > - One more include for Minimal VM > - Turn declarations proper inline That's nice. It looks to me like those methods would only be a few instructions long, so the overhead of a subroutine would be disproportionate, and the cost of inlining small. Win-win. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23634#pullrequestreview-2618563348 From vlivanov at openjdk.org Fri Feb 14 19:06:16 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 14 Feb 2025 19:06:16 GMT Subject: RFR: 8350086: Inline hot Method accessors for faster task selection [v3] In-Reply-To: References: <2S92WMb5nqAG6LoBfpEmYf-0UubJpCAZ3XDUg2bKRos=.27a8beae-78b1-4934-84fd-f13cbad105f4@github.com> Message-ID: On Fri, 14 Feb 2025 15:12:50 GMT, Aleksey Shipilev wrote: >> These methods show up prominently on Leyden profiles, as compilation policy asks these properties for methods very often during compile task selection: >> - `Method::invocation_count` >> - `Method::backedge_count` >> - `Method::highest_comp_level` >> >> We can move the definitions for these methods to method.inline.hpp to make them eligible for better inlining. >> >> `interpreter_invocation_count()` method is a bit weird, looks like a leftover from [JDK-8251462](https://bugs.openjdk.org/browse/JDK-8251462). Removing it would prompt more cleanups and renamings in `ciMethod`, so I would leave it for future enhancement. >> >> Additional testing: >> - [x] Spot-checked Leyden profiles, methods are now fully inlined into hot `CompilerBroker` methods >> - [x] Ad-hoc Leyden benchmarks show minor improvements (< 1%) for time spent in compiler threads > > Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: > > - One more include for Minimal VM > - Turn declarations proper inline Marked as reviewed by vlivanov (Reviewer). The patch looks well-justified to me, but it feels like the focus is on a symptom and not the root cause. > These methods show up prominently on Leyden profiles, as compilation policy asks these properties for methods very often during compile task selection That's the consequence of poor scaling properties `CompilationPolicy::select_task()` demonstrates. Each `CompileQueue::get()` call involves a linear pass over the whole compile queue (implemented as linked list) recomputing event rate each time. The longer the queue, the more time it takes to select next task to compile. And Leyden greatly exarcebates the problem by aggressively submitting compilation tasks based on training data. FTR heavy lock contention on `MethodCompileQueue_lock` was another symptom of the very same problem. Proper fix would be to reimplement how compilation task prioritization is implemented. ------------- PR Review: https://git.openjdk.org/jdk/pull/23634#pullrequestreview-2618599459 PR Comment: https://git.openjdk.org/jdk/pull/23634#issuecomment-2660065664 From vlivanov at openjdk.org Fri Feb 14 19:16:11 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 14 Feb 2025 19:16:11 GMT Subject: RFR: 8350086: Inline hot Method accessors for faster task selection [v3] In-Reply-To: References: <2S92WMb5nqAG6LoBfpEmYf-0UubJpCAZ3XDUg2bKRos=.27a8beae-78b1-4934-84fd-f13cbad105f4@github.com> Message-ID: On Fri, 14 Feb 2025 18:35:12 GMT, Coleen Phillimore wrote: > This seems fine but at one point we were talking about moving what looks like the duplicated InvocationCounters from MethodCounters and use MDO instead. I think this looks like it could be something to clean up. There's definitely some duplication between MethodCounter and MDO, but those two serve different purposes at runtime when it comes to profiling (facilitate different profiling modes). There are ways to merge them, but it may have far-reaching consequences for the implementation (e.g., fast MDO presence check is used to guard profiling logic in interpreter). Not clear to me whether it'll worth the effort. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23634#issuecomment-2660083999 From ccheung at openjdk.org Fri Feb 14 19:21:32 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 14 Feb 2025 19:21:32 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v2] In-Reply-To: References: Message-ID: > This changset refactors CDS class paths and module paths validation code into a new class `AOTCodeSource` and related class `AOTCodeSourceConfig`. Code has been moved from filemap.[c|h]pp, classLoader.[c|h]pp, and classLoaderExt.[c|h]pp to aotCodeSource.[c|h]pp. CDS dependencies have been removed from `classLoader.cpp`. More refactoring could be done, such as removing `classLoaderExt.cpp`, in a future RFE. > > Passed tiers 1 - 5 testing. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: @iklam and @ashu-mehra comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23476/files - new: https://git.openjdk.org/jdk/pull/23476/files/816ae7ea..01238742 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23476&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23476&range=00-01 Stats: 62 lines in 5 files changed: 12 ins; 17 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/23476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23476/head:pull/23476 PR: https://git.openjdk.org/jdk/pull/23476 From ccheung at openjdk.org Fri Feb 14 19:21:32 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 14 Feb 2025 19:21:32 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v2] In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 05:25:30 GMT, Ioi Lam wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> @iklam and @ashu-mehra comment > > src/hotspot/share/cds/aotCodeSource.cpp line 133: > >> 131: >> 132: // AllCodeSourceStreams is used to iterate over all the code sources that >> 133: // are available to the application from -Xbootclasspath, -classpath and --module-path > > Consider adding this comment: > > // When creating an AOT cache, we store the contents from AllCodeSourceStreams > // into an array of AOTCodeSources. See AOTCodeSourceConfig::dumptime_init_helper(). > // When loading the AOT cache in a production run, we compare the contents of the > // stored AOTCodeSources against the current AllCodeSourceStreams to determine whether > // the AOT cache is compatible with the current JVM. See AOTCodeSourceConfig::validate(). Added the comment. > src/hotspot/share/cds/aotCodeSource.hpp line 126: > >> 124: // Non-existent entries are recored during AOTCache creation. Those non-existent entries >> 125: // must not exist during runtime. >> 126: // > > Typos: > - "subjected to AOTCodeSourceConfig::validate()" -- the function has two parameters, but we can omit them in this comment > - "validation is performed on *the* AOTCodeSources" > - "during AOTCache creation *are* the same" > - "on-existent entries are *recorded*" Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1956613731 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1956613683 From ccheung at openjdk.org Fri Feb 14 19:21:33 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 14 Feb 2025 19:21:33 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v2] In-Reply-To: <8yqgZ4ffmEyui_CyUR9lS-2MV4ONfzSaA-pMz-VvDMA=.e92da2b8-cfc5-4ef9-ab33-9d14ca02a2f8@github.com> References: <8yqgZ4ffmEyui_CyUR9lS-2MV4ONfzSaA-pMz-VvDMA=.e92da2b8-cfc5-4ef9-ab33-9d14ca02a2f8@github.com> Message-ID: On Thu, 13 Feb 2025 03:55:50 GMT, Ashutosh Mehra wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> @iklam and @ashu-mehra comment > > src/hotspot/share/cds/aotCodeSource.cpp line 762: > >> 760: } >> 761: >> 762: if (is_boot_classpath && runtime_css.has_next() && (need_to_check_app_classpath() || num_module_paths() > 0)) { > > I am not sure I get what this block is for. Is it for the case where runtime boot cp has more entries than the dumptime boot cp, and it is checking if the extra entries really exist or they are just empty? If so, then `check_paths_existence` should only be checking the extra entries in the boot cp, not all of them. > > Can you please explain this and probably add a comment as well to describe what this block is for. I added some comment: // Check if the runtime boot classpath has more entries than the one stored in the archive and if the app classpath // or the module path requires validation. if (is_boot_classpath && runtime_css.has_next() && (need_to_check_app_classpath() || num_module_paths() > 0)) { // the check passes if all the extra runtime boot classpath entries are non-existent if (check_paths_existence(runtime_css)) { log_warning(cds)("boot classpath is longer than expected"); return false; } } Also fixed the `check_paths_existence()` method so it only checks the extra entries. > src/hotspot/share/cds/aotCodeSource.cpp line 894: > >> 892: // matched exactly. >> 893: bool AOTCodeSourceConfig::need_lcp_match(AllCodeSourceStreams& all_css) const { >> 894: if (!need_lcp_match_helper(boot_start(), boot_end(), all_css.boot_cp()) || > > Can we reverse these conditions to make it easier to read? > > > if (need_lcp_match_helper(boot_start(), boot_end(), all_css.boot_cp()) && > need_lcp_match_helper(app_start(), app_end(), all_css.app_cp())) { > return true; > } else { > return false; > } Done. > src/hotspot/share/cds/aotCodeSource.cpp line 903: > >> 901: >> 902: bool AOTCodeSourceConfig::need_lcp_match_helper(int start, int end, CodeSourceStream& css) const { >> 903: if (app_end() == boot_start()) { > > I feel this block belongs to the caller `need_lcp_match`. Fixed. > src/hotspot/share/cds/aotCodeSource.hpp line 213: > >> 211: >> 212: // Common accessors >> 213: int boot_start() const { return 1; } > > Can we rename these methods to something like boot_start() -> boot_cp_start_index(). > At the call site it makes it clear it is referring to the bootclasspath index, and not booting something :) I renamed them as follows: // Common accessors int boot_cp_start_index() const { return 1; } int boot_cp_end_index() const { return _boot_classpath_end; } int app_cp_start_index() const { return boot_cp_end_index(); } int app_cp_end_index() const { return _app_classpath_end; } int module_path_start_index() const { return app_cp_end_index(); } int module_path_end_index() const { return _module_end; } > src/hotspot/share/cds/aotCodeSource.hpp line 234: > >> 232: // Functions used only during dumptime >> 233: static void dumptime_init(TRAPS); >> 234: static size_t estimate_size_for_archive() { > > This method doesn't seem to be in use. Can this be removed? Removed. I also removed the `estimate_size_for_archive_helper()` method. > src/hotspot/share/cds/filemap.cpp line 318: > >> 316: if (header()->has_full_module_graph() && has_extra_module_paths) { >> 317: CDSConfig::stop_using_optimized_module_handling(); >> 318: log_info(cds)("optimized module handling: disabled because of extra module path(s) are specified"); > > typo: "disabled because ~of~ extra module path(s) are specified" Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1956614255 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1956614149 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1956614064 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1956613894 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1956613789 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1956614357 From vlivanov at openjdk.org Fri Feb 14 19:30:17 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 14 Feb 2025 19:30:17 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v2] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Wed, 12 Feb 2025 21:09:31 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > fix bounds checks Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23557#pullrequestreview-2618641095 From vlivanov at openjdk.org Fri Feb 14 19:30:18 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 14 Feb 2025 19:30:18 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v2] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> <7xJgm0ScXMp4iRaH7Sf5QfsrTv2jOV4078kPqn3aoCs=.63303086-b4bd-47c5-9bd5-e69e28f75f4c@github.com> Message-ID: On Thu, 13 Feb 2025 21:23:55 GMT, Dean Long wrote: >> As far as I can tell, it was never needed. If an invokedynamic or invokehandle adds an appendix, then it will show up in the callee, and will be reflected in the caller args size, so there is no mismatch. As far as the JVM is concerned, an invokedynamic/invokehandle looks like a call to a JVM-generated adapter. The only way for invokedynamic/invokehandle to cause an argument mismatch is if the JVM resolved the call-site to an adapter that was actually a MethodHandle linker. That is the exception I describe in the comment below. If we ever allowed the JVM to do that, then several other checks would also need to be fixed. >> For the record, this code used to call cur.is_method_handle_invoke(), which was also wrong, but at least it had a name closer to what we would want. Ideally, something like is_method_handle_linker_invoke() that checks for linkToVirtual, linkToStatic, linkToSpecial, and linkToInterface would have been better. >> The old comment about "arbitrary chains of calls" seems to be left over from an early JSR292 feature known as Ricochet Frames. > > For the curious, it is still possible create an arbitrarily long chain of linkTo calls, but only trusted code would be able to do that, so I'm not addressing this issue in this PR. Thanks for the clarifications! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1956626058 From gziemski at openjdk.org Fri Feb 14 19:57:48 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 14 Feb 2025 19:57:48 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v45] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: - cleanup Win build - cleanup Win build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/fb176588..422a655a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=44 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=43-44 Stats: 32 lines in 1 file changed: 13 ins; 13 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From gziemski at openjdk.org Fri Feb 14 20:11:53 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 14 Feb 2025 20:11:53 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v46] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: white spaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/422a655a..a20b4c0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=45 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=44-45 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From nbenalla at openjdk.org Fri Feb 14 20:35:19 2025 From: nbenalla at openjdk.org (Nizar Benalla) Date: Fri, 14 Feb 2025 20:35:19 GMT Subject: RFR: 8343802: Prevent NULL usage backsliding [v7] In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 09:34:54 GMT, Nizar Benalla wrote: >> Please review this patch to add a test that checks the hotspot sources and test files for usages of NULL. >> It scans files in those directories, filtering out certain files as well as all `.c`, `.java`, `.class`, `.jar` and `.zip` files in test sources. >> >> Before adding line 86 and excluding `os_windows.cpp`, the test failed with: >> >> >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4436: >> HMODULE hModule = NULL; >> Error: 'NULL' found in /w/jdk/src/hotspot/os/windows/os_windows.cpp at line 4437: >> GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, NULL, &hModule); >> java.lang.RuntimeException: Found usage of 'NULL' in source files. See errors above. >> at TestNoNULL.main(TestNoNULL.java:73) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) >> at java.base/java.lang.reflect.Method.invoke(Method.java:565) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1447) > > Nizar Benalla has updated the pull request incrementally with one additional commit since the last revision: > > - rename excludeExtensions -> excludedExtensions > - remove redundant import/throws Thank you Johan, will keep this in mind. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23466#issuecomment-2660213074 From gziemski at openjdk.org Fri Feb 14 21:42:54 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 14 Feb 2025 21:42:54 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v47] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: win build fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/a20b4c0c..3b595d8e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=46 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=45-46 Stats: 6 lines in 1 file changed: 3 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From dlong at openjdk.org Fri Feb 14 22:41:13 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 14 Feb 2025 22:41:13 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v2] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Wed, 12 Feb 2025 21:09:31 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > fix bounds checks > I think you can make the assertion a little stricter like this [reinrich at 9c3c8a3](https://github.com/reinrich/jdk/commit/9c3c8a33a29b9ae6c4c703992b306dc0cbbcd2f0). Regarding this stricter version, why are you using is_bottom_frame instead of is_top_frame? The deoptimization code seems to name the most recent leaf frame "top". That sounds like what frame::top_ijava_frame_abi_size is for too. Thanks for the review, Vladimir. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2660395298 PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2660398727 From dlong at openjdk.org Fri Feb 14 23:47:12 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 14 Feb 2025 23:47:12 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: <8JUfZWRWpAhYCG9qO7Jxfj5k6d1iUNpRdawRn-veiBQ=.4b70e450-14e5-429a-aa95-08599673afba@github.com> On Wed, 5 Feb 2025 14:41:39 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 385: > 383: > 384: HeapWord* ParallelScavengeHeap::mem_allocate_old_gen(size_t size) { > 385: if (!should_alloc_in_eden(size) || GCLocker::is_active()) { I don't understand why we are checking is_active() here. The value is not reliable if we aren't at a safepoint, and iterating over all threads seems expensive. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1956881801 From fmatte at openjdk.org Sat Feb 15 01:23:19 2025 From: fmatte at openjdk.org (Fairoz Matte) Date: Sat, 15 Feb 2025 01:23:19 GMT Subject: RFR: 8347833: CrashOnOutOfMemory should stop GC threads before HeapDumpOnOutOfMemoryError [v2] In-Reply-To: <2pMqSLSLN3ZvhqOFwqVQUhscx61ZN-Xm1xc0fnDjWZk=.823b5a53-80d6-4b6e-8ca1-3fb5a20dced7@github.com> References: <2pMqSLSLN3ZvhqOFwqVQUhscx61ZN-Xm1xc0fnDjWZk=.823b5a53-80d6-4b6e-8ca1-3fb5a20dced7@github.com> Message-ID: <-j75YVdwEtMi18sgHEv_8ReZw37OiUy6dj5vcVSiH8E=.84098289-bbdf-4018-9558-9fb16aa2f1ce@github.com> On Fri, 14 Feb 2025 05:34:34 GMT, David Holmes wrote: >> Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: >> >> Aditional work on review comments > > src/hotspot/share/utilities/debug.cpp line 272: > >> 270: VMError::report_java_out_of_memory(message, HeapDumpOnOutOfMemoryError, CrashOnOutOfMemoryError); >> 271: >> 272: if (CrashOnOutOfMemoryError) { > > The `if (CrashOnOutOfMemoryError)` is unreachable code as `report_java_out_of_memory` already aborted. yes, it can be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23519#discussion_r1956972198 From fmatte at openjdk.org Sat Feb 15 01:23:19 2025 From: fmatte at openjdk.org (Fairoz Matte) Date: Sat, 15 Feb 2025 01:23:19 GMT Subject: RFR: 8347833: CrashOnOutOfMemory should stop GC threads before HeapDumpOnOutOfMemoryError [v2] In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 05:37:51 GMT, David Holmes wrote: >> I have copied this code from TestHeapDumpOnOutOfMemoryError.java, and I have restricted heap to -Xmx128M to force OOM in heap. > > The heap size will make no difference as you are trying to create an array that is larger than the VM allows. Line 42 will throw OOME and the for loop is never reached. > > | Welcome to JShell -- Version 23 > | For an introduction type: /help intro > > jshell> Object[] oa = new Object[Integer.MAX_VALUE]; > | Exception java.lang.OutOfMemoryError: Requested array size exceeds VM limit > | at (#1:1) I also observed that in Jshell gives `Requested array size exceeds VM limit` , but during program execution, it gets into this `STDERR: stdout: [java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid238354.hprof ... Heap dump file created [187789641 bytes in 0.285 secs] Aborting due to java.lang.OutOfMemoryError: Java heap space ` Based on this assumption only below test are defined. https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/runtime/ErrorHandling/TestHeapDumpOnOutOfMemoryError.java https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/runtime/ErrorHandling/TestHeapDumpPath.java https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/runtime/ErrorHandling/TestGZippedHeapDumpOnOutOfMemoryError.java ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23519#discussion_r1956972090 From shade at openjdk.org Sat Feb 15 08:26:13 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Sat, 15 Feb 2025 08:26:13 GMT Subject: RFR: 8350086: Inline hot Method accessors for faster task selection [v3] In-Reply-To: References: <2S92WMb5nqAG6LoBfpEmYf-0UubJpCAZ3XDUg2bKRos=.27a8beae-78b1-4934-84fd-f13cbad105f4@github.com> Message-ID: On Fri, 14 Feb 2025 15:12:50 GMT, Aleksey Shipilev wrote: >> These methods show up prominently on Leyden profiles, as compilation policy asks these properties for methods very often during compile task selection: >> - `Method::invocation_count` >> - `Method::backedge_count` >> - `Method::highest_comp_level` >> >> We can move the definitions for these methods to method.inline.hpp to make them eligible for better inlining. >> >> `interpreter_invocation_count()` method is a bit weird, looks like a leftover from [JDK-8251462](https://bugs.openjdk.org/browse/JDK-8251462). Removing it would prompt more cleanups and renamings in `ciMethod`, so I would leave it for future enhancement. >> >> Additional testing: >> - [x] Spot-checked Leyden profiles, methods are now fully inlined into hot `CompilerBroker` methods >> - [x] Ad-hoc Leyden benchmarks show minor improvements (< 1%) for time spent in compiler threads > > Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: > > - One more include for Minimal VM > - Turn declarations proper inline Thank you for reviews! Yes, the core of the problem is potentially quadratic behavior in task selection, like Vladimir describes. We had this problem in Leyden for SCC tasks: https://github.com/openjdk/leyden/pull/17 -- so it does not apply to SC loaded methods anymore. I agree it would be great to resolve the task selection problem at its core; unfortunately, my crude attempts at doing so failed, because tiered policy is quite fiddly. It does not mean we would not try again, it just means it would take a bit more time. Meanwhile, we can address little inefficiencies without solving the core issue. To that extent, I would spin this more positively: now that Leyden is able to shift away the bulk of C2 compilations away, the little inefficiencies in normal compilation paths show up in those runs. The inefficiency is also in mainline, but it would be obscured by the heavy compilations that would follow the task selection. So I think these kind of inlining improvements stand well on their own, and are still worth doing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23634#issuecomment-2660812463 From ayang at openjdk.org Sat Feb 15 11:44:44 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sat, 15 Feb 2025 11:44:44 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v3] In-Reply-To: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: > Here is an attempt to simplify GCLocker implementation for Serial and Parallel. > > GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. > > The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. > > Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into gclocker - review - Merge branch 'master' into gclocker - review - Merge branch 'master' into gclocker - gclocker ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23367/files - new: https://git.openjdk.org/jdk/pull/23367/files/1b6f908b..005087e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23367&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23367&range=01-02 Stats: 18668 lines in 693 files changed: 10993 ins; 4307 del; 3368 mod Patch: https://git.openjdk.org/jdk/pull/23367.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23367/head:pull/23367 PR: https://git.openjdk.org/jdk/pull/23367 From ayang at openjdk.org Sat Feb 15 11:49:13 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sat, 15 Feb 2025 11:49:13 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: <8JUfZWRWpAhYCG9qO7Jxfj5k6d1iUNpRdawRn-veiBQ=.4b70e450-14e5-429a-aa95-08599673afba@github.com> References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> <8JUfZWRWpAhYCG9qO7Jxfj5k6d1iUNpRdawRn-veiBQ=.4b70e450-14e5-429a-aa95-08599673afba@github.com> Message-ID: On Fri, 14 Feb 2025 23:44:25 GMT, Dean Long wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into gclocker >> - review >> - Merge branch 'master' into gclocker >> - gclocker > > src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 385: > >> 383: >> 384: HeapWord* ParallelScavengeHeap::mem_allocate_old_gen(size_t size) { >> 385: if (!should_alloc_in_eden(size) || GCLocker::is_active()) { > > I don't understand why we are checking is_active() here. The value is not reliable if we aren't at a safepoint, and iterating over all threads seems expensive. The intention is to avoid blocking java threads if possible, but there is no fundamental reason why it has be to this way. I have removed it for simpler (or less magical) code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1957098815 From ayang at openjdk.org Sat Feb 15 11:52:14 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sat, 15 Feb 2025 11:52:14 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Fri, 7 Feb 2025 06:43:25 GMT, David Holmes wrote: > But in any case adding the atomic load to in_critical() is basically a no-op (loads are atomic) so no need to add a new API just to do that. I have removed the new API, and switched to use the original `in_critical()`. > I think that to get the correct "dekker duality" in this code you do need to have full fences between the stores and loads, not just a storeload barrier. I have changed to `fence` for simpler reasoning. (In our codebase, the two have the same implementation, so perf should be the same.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2660886740 From duke at openjdk.org Sat Feb 15 12:43:19 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Sat, 15 Feb 2025 12:43:19 GMT Subject: RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic In-Reply-To: References: Message-ID: On Wed, 7 Feb 2024 14:35:55 GMT, Yuri Gaevsky wrote: > Hello All, > > Please review these changes to enable the __vectorizedMismatch_ intrinsic on RISC-V platform with RVV instructions supported. > > Thank you, > -Yuri Gaevsky > > **Correctness checks:** > hotspot/jtreg/compiler/{intrinsic/c1/c2}/ under QEMU-8.1 with RVV v1.0.0 and -XX:TieredStopAtLevel=1/2/3/4. . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-2660904414 From lmesnik at openjdk.org Sat Feb 15 19:48:45 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 15 Feb 2025 19:48:45 GMT Subject: RFR: 8350151: Support requires property to filer tests incompatible with --enable-preview Message-ID: It might be useful to be able to run testing with --enable-preview for feature development. The tests incompatible with this mode must be filtered out. I chose name 'java.enablePreview' , because it is more java property than vm or jdk. And 'enablePreview' to be similar with jtreg tag. Tested by running all test suites, and verifying that test is correctly selected. There are more tests incompatible with --enable-preview, will mark them in the following bug. ------------- Commit messages: - Update VPProps Changes: https://git.openjdk.org/jdk/pull/23653/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23653&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350151 Stats: 27 lines in 5 files changed: 19 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23653.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23653/head:pull/23653 PR: https://git.openjdk.org/jdk/pull/23653 From stuefe at openjdk.org Sun Feb 16 05:52:12 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 16 Feb 2025 05:52:12 GMT Subject: RFR: 8330174: Protection zone for easier detection of accidental zero-nKlass use [v5] In-Reply-To: References: Message-ID: On Thu, 23 Jan 2025 06:15:24 GMT, Thomas Stuefe wrote: >> If we wrongly decode an nKlass of `0`, and the nKlass encoding base is not NULL (typical for most cases that run with CDS enabled), the resulting pointer points to the start of the Klass encoding range. That area is readable. If CDS is enabled, it will be at the start of the CDS metadata archive. If CDS is off, it is at the start of the class space. >> >> Now, both CDS and class space allocate a safety buffer at the start to prevent Klass structures from being located there. However, that memory is still readable, so we can read garbage data from that area. In the case of CDS, that area is just 16 bytes, after that come real data. Since Klass is large, most accesses will read beyond the 16-byte zone. >> >> We should protect the first page in the narrow Klass encoding range to make analysis of errors like this easier. Especially in release builds where decode_not_null does not assert. We already use a similar technique in the heap, and most OSes protect the zero page for the same reason. >> >> This patch does that. Now, decoding an `0` nKlass and then using the result `Klass` - calling virtual functions or accessing members - crashes right away. >> >> Additionally, the patch provides a helpful output in the register/stack section, e.g: >> >> >> RDI=0x0000000800000000 points into nKlass protection zone >> >> >> >> Testing: >> - GHAs. >> - I tested the patch manually on x64 Linux for both CDS on, CDS off and zero-based encoding, CDS off and non-zero-based encoding. >> - I tested manually on Windows x64 >> - I also prepared an automatic gtest, but that needs some preparatory work on the gtest suite first to work (see https://bugs.openjdk.org/browse/JDK-8348029) >> >> -- Update 2024-01-22 -- >> I added a jtreg test that is more thorough than a gtest (also scans the produced hs-err file) > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > fix whitespace error > > It might be easier if we introduce a new "core" region called "protection" that's 16MB in size, and allocate that before the rw region in the output buffer. We never map this region so it doesn't need to be stored in the archive file. > > Let me try this out and see if it works. > > Hi Thomas, please try this out: > > [master...iklam:jdk:8330174-protection-zone-ioi-contributions](https://github.com/openjdk/jdk/compare/master...iklam:jdk:8330174-protection-zone-ioi-contributions) > > It passes all CDS tests. You can see the gap: > > ``` > $ java -Xlog:cds -XX:ArchiveRelocationMode=0 --version | egrep '(Mapped)|(_rs)' > [0.017s][info][cds] Reserved archive_space_rs [0x0000000800000000 - 0x0000000801000000] (16777216) bytes > [0.017s][info][cds] Reserved class_space_rs [0x0000000801000000 - 0x0000000841000000] (1073741824) bytes > [0.017s][info][cds] Mapped static region #0 at base 0x0000000800001000 top 0x0000000800557000 (ReadWrite) > [0.017s][info][cds] Mapped static region #1 at base 0x0000000800557000 top 0x0000000800dee000 (ReadOnly) > [0.017s][info][cds] Mapped static region #2 at base 0x000079ff9c021000 top 0x000079ff9c056000 (Bitmap) > ``` > > You'd need to add code to disable all RWX access in 0x800000000 ~ 0x800001000. I like this, @iklam, thanks. I am on vacation currently; will try this when I'm back. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23190#issuecomment-2661262145 From stuefe at openjdk.org Sun Feb 16 07:46:10 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 16 Feb 2025 07:46:10 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v9] In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 13:39:32 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. >> >> Two key changes enable this feature: >> 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. >> 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. >> >> >> Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. >> >> An example of how you could use the intrusive tree is found below: >> >> ```c++ >> struct MyIntrusiveStructure { >> Node node; // The tree node is part of an external structure >> int data; >> >> MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} >> Node* get_node() { return &node; } >> static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } >> }; >> >> Tree my_intrusive_tree; >> >> Cursor insert_cursor = my_intrusive_tree.cursor_find(0); >> Node insert_node = Node(0); >> >> // Custom allocation here is just malloc >> MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); >> new (place) MyIntrusiveStructure(0, insert_node); >> >> my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); >> >> Cursor find_cursor = my_intrusive_tree.cursor_find(0); >> int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; >> >> >> >> Please let me know any feedback or concerns! > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > renamed non-value upsert to insert Hi @caspernorrbin, Sorry for the delay. I tried to use the tree in a tiny small allocator (basically what I plan to do in Metaspace and in other places), and though it is now possible in principle I think we can simplify things and make them more user-friendly. Here my findings: - It would be good to have RBNode simplified and defined outside of the tree. Possibly even in a different header. I can see RBNode being used in places where I don't know exactly what tree it goes into. It could even go into multiple trees at the same time or at different times (so the data structure would have either multiple RBNode inlined or a single one that gets repurposed for different trees). - In that line, it would be good to have the key in RBNode being mutable. Having it const means I am forced to write constructors for containing structures. That is cumbersome. - To me, it seems the code could be a lot simpler if you were to just use standard subclassing (AbstractRBTree->(NonIntrusiveRBTree|IntrusiveRBTree) etc. All these `std::is_same` are a bit much. There would also be no need for the `RBTreeNoopAllocator`. The vtable tax you'd pay won't matter much in reality since I don't foresee many cases where the specific tree type is not known. - I found I had little need for cursors at all. Mostly, they just got in the way. Cursor exists to modify tree structure, but why would I ever do that manually? It is different with simple structures like linked lists, but here the tree balances itself, so it has the last say about its structure anyway. I would be perfectly happy with just the simple ability to add/remove nodes manually, use nodes to find nearby nodes (as in, nodes of nearby keys), iterate nodes with a functor etc. - The few cases I needed a cursor it was because the API forced me to (e.g. when removing a node from the tree). With insertion, it got very weird. So I have an RBNode*, want to insert it into the tree, now I need an empty Cursor to do that? So, I create an empty cursor with that key, then use that with insert_at_cursor? Why? - Let's say I already have a nearby node (result of closest_gt, for instance), but it does not satisfy me, so I add a new one. For that I need to call normal insert, so the search is done all over again (see my remark above above). It would be good if we could have an insertion with a node as an insertion hint. - I found that I miss a closest_ge (greater or equal). Please add that. It's really needed for memory management trees. - In closest_gt() etc, please extend the comments to say what the behavior is when the node is not found. I assume null is returned? I'll continue the work later (have vacation next week). ------------- PR Review: https://git.openjdk.org/jdk/pull/23416#pullrequestreview-2592252537 From stuefe at openjdk.org Sun Feb 16 07:46:11 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 16 Feb 2025 07:46:11 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v9] In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 19:39:36 GMT, Johan Sj?len wrote: >> Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: >> >> renamed non-value upsert to insert > > src/hotspot/share/utilities/rbTree.hpp line 71: > >> 69: const K& key() const { return _key; } >> 70: V& val() { return _value; } >> 71: V& val() const { return _value; } > > Hmm, this doesn't seem quite right. Why can't we have a `const` method returning a `const` value anymore? Yes. Please give us const and non-const access for node access. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23416#discussion_r1940987593 From dholmes at openjdk.org Mon Feb 17 01:50:49 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 17 Feb 2025 01:50:49 GMT Subject: RFR: 8350162: ProblemList compiler/tiered/Level2RecompilationTest.java Message-ID: <_QgjqK6G4XpD58VBiKmVFcKytpA_rMrrE9rjHYmRVFY=.0f52641a-2a2e-41fe-93c9-23031772721c@github.com> Simple problem listing for a test that is causing a lot of noise in our CI. Thanks ------------- Commit messages: - 8350162: ProblemList compiler/tiered/Level2RecompilationTest.java Changes: https://git.openjdk.org/jdk/pull/23657/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23657&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350162 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23657/head:pull/23657 PR: https://git.openjdk.org/jdk/pull/23657 From jpai at openjdk.org Mon Feb 17 02:14:18 2025 From: jpai at openjdk.org (Jaikiran Pai) Date: Mon, 17 Feb 2025 02:14:18 GMT Subject: RFR: 8350162: ProblemList compiler/tiered/Level2RecompilationTest.java In-Reply-To: <_QgjqK6G4XpD58VBiKmVFcKytpA_rMrrE9rjHYmRVFY=.0f52641a-2a2e-41fe-93c9-23031772721c@github.com> References: <_QgjqK6G4XpD58VBiKmVFcKytpA_rMrrE9rjHYmRVFY=.0f52641a-2a2e-41fe-93c9-23031772721c@github.com> Message-ID: On Mon, 17 Feb 2025 01:36:49 GMT, David Holmes wrote: > Simple problem listing for a test that is causing a lot of noise in our CI. > > Thanks Looks good and trivial to me. ------------- Marked as reviewed by jpai (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23657#pullrequestreview-2619832543 From dholmes at openjdk.org Mon Feb 17 02:14:18 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 17 Feb 2025 02:14:18 GMT Subject: RFR: 8350162: ProblemList compiler/tiered/Level2RecompilationTest.java In-Reply-To: References: <_QgjqK6G4XpD58VBiKmVFcKytpA_rMrrE9rjHYmRVFY=.0f52641a-2a2e-41fe-93c9-23031772721c@github.com> Message-ID: On Mon, 17 Feb 2025 02:07:27 GMT, Jaikiran Pai wrote: >> Simple problem listing for a test that is causing a lot of noise in our CI. >> >> Thanks > > Looks good and trivial to me. Thanks @jaikiran ------------- PR Comment: https://git.openjdk.org/jdk/pull/23657#issuecomment-2661797975 From dholmes at openjdk.org Mon Feb 17 02:14:18 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 17 Feb 2025 02:14:18 GMT Subject: Integrated: 8350162: ProblemList compiler/tiered/Level2RecompilationTest.java In-Reply-To: <_QgjqK6G4XpD58VBiKmVFcKytpA_rMrrE9rjHYmRVFY=.0f52641a-2a2e-41fe-93c9-23031772721c@github.com> References: <_QgjqK6G4XpD58VBiKmVFcKytpA_rMrrE9rjHYmRVFY=.0f52641a-2a2e-41fe-93c9-23031772721c@github.com> Message-ID: On Mon, 17 Feb 2025 01:36:49 GMT, David Holmes wrote: > Simple problem listing for a test that is causing a lot of noise in our CI. > > Thanks This pull request has now been integrated. Changeset: 21927237 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/2192723734e4edd2d2136637a46e9256c1b15703 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8350162: ProblemList compiler/tiered/Level2RecompilationTest.java Reviewed-by: jpai ------------- PR: https://git.openjdk.org/jdk/pull/23657 From aboldtch at openjdk.org Mon Feb 17 06:16:12 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 17 Feb 2025 06:16:12 GMT Subject: RFR: 8349652: Rewire nmethod oop load barriers In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 09:57:15 GMT, Stefan Karlsson wrote: > When loading oops from nmethods we current use the Access API to inject load barriers for the GCs that requires them. As part of the ZGC load barrier we need access to the nmethod to properly perform the load barrier. The current implementation of the Access API doesn't support passing down the nmethod through all its layers of code so ZGC asks the code cache what nmethod the various oops belongs to. There's currently an open PR for JDK-8343789 (#21276), which moves the oops out of the code cache, so the current way ZGC implementation will not work after that has been integrated. > > The proposal is to figure out a way to explicitly pass down the nmethod to the load barriers. > > We could extend the Access API to pass down the nmethod through all its various layers. The drawback of that is that it adds a lot of boiler plate code and requires new over loads and/or names. Given that this isn't performance critical code I propose that we take the much simpler route and call straight to the BarrierSetNMethod class. > > Given that MMethodAccess and IN_NMETHOD were only introduced to support nmethod oop loads for ZGC and are note used anymore I've also removed them from the code. > > Tested with reproducer for the ZGC issue in JDK-8343789, tier1-7 Linux with ZGC tasks, currently running tier1-3. lgtm. Nice to get it out of the Access API. Looking at this change, seems like a lot of plumbing for two distinct loads. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23512#pullrequestreview-2620114059 From jwaters at openjdk.org Mon Feb 17 08:27:56 2025 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 17 Feb 2025 08:27:56 GMT Subject: RFR: 8342769: HotSpot Windows/gcc port is broken [v16] In-Reply-To: References: Message-ID: > Several areas in HotSpot are broken in the gcc port. These, with the exception of 1 rather big oversight within SharedRuntime::frem and SharedRuntime::drem, are all minor correctness issues within the code. These mostly can be fixed with simple changes to the code. Note that I am not sure whether the SharedRuntime::frem and SharedRuntime::drem fix is correct. It may be that they can be removed entirely Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - CAST_FROM_FN_PTR in os_windows.cpp - Merge branch 'master' into hotspot - Merge branch 'openjdk:master' into hotspot - _WINDOWS && AARCH64 in sharedRuntime.hpp - AARCH64 in sharedRuntimeRem.cpp - Refactor sharedRuntime.cpp - CAST_FROM_FN_PTR in os_windows.cpp - Merge branch 'openjdk:master' into hotspot - fmod_winarm64 in sharedRuntime.cpp - fmod_winarm64 in sharedRuntimeRem.cpp - ... and 19 more: https://git.openjdk.org/jdk/compare/5e9d72e2...3f9ca206 ------------- Changes: https://git.openjdk.org/jdk/pull/21627/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21627&range=15 Stats: 54 lines in 7 files changed: 23 ins; 7 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/21627.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21627/head:pull/21627 PR: https://git.openjdk.org/jdk/pull/21627 From jwaters at openjdk.org Mon Feb 17 08:27:57 2025 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 17 Feb 2025 08:27:57 GMT Subject: RFR: 8342769: HotSpot Windows/gcc port is broken [v15] In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 06:10:28 GMT, Julian Waters wrote: >> Several areas in HotSpot are broken in the gcc port. These, with the exception of 1 rather big oversight within SharedRuntime::frem and SharedRuntime::drem, are all minor correctness issues within the code. These mostly can be fixed with simple changes to the code. Note that I am not sure whether the SharedRuntime::frem and SharedRuntime::drem fix is correct. It may be that they can be removed entirely > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 27 commits: > > - Merge branch 'openjdk:master' into hotspot > - _WINDOWS && AARCH64 in sharedRuntime.hpp > - AARCH64 in sharedRuntimeRem.cpp > - Refactor sharedRuntime.cpp > - CAST_FROM_FN_PTR in os_windows.cpp > - Merge branch 'openjdk:master' into hotspot > - fmod_winarm64 in sharedRuntime.cpp > - fmod_winarm64 in sharedRuntimeRem.cpp > - fmod_winarm64 in sharedRuntime.hpp > - Typo in sharedRuntimeRem.cpp > - ... and 17 more: https://git.openjdk.org/jdk/compare/a606836a...ff1c4664 Re-review required: There was a merge conflict that Skara did not detect ------------- PR Comment: https://git.openjdk.org/jdk/pull/21627#issuecomment-2662372306 From alanb at openjdk.org Mon Feb 17 08:31:15 2025 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 17 Feb 2025 08:31:15 GMT Subject: RFR: 8350151: Support requires property to filer tests incompatible with --enable-preview In-Reply-To: References: Message-ID: <1iY92LjhRPbtuENrxBQlsCOKx2EHI6leLAfbkorEGzE=.e964726d-cf2c-4715-91fc-c76fc3e6668d@github.com> On Sat, 15 Feb 2025 19:43:39 GMT, Leonid Mesnik wrote: > It might be useful to be able to run testing with --enable-preview for feature development. The tests incompatible with this mode must be filtered out. > > I chose name 'java.enablePreview' , because it is more java property than vm or jdk. And 'enablePreview' to be similar with jtreg tag. > > Tested by running all test suites, and verifying that test is correctly selected. > There are more tests incompatible with --enable-preview, will mark them in the following bug. test/jdk/java/lang/System/SecurityManagerWarnings.java line 28: > 26: * @bug 8266459 8268349 8269543 8270380 > 27: * @summary check various warnings > 28: * @requires !java.enablePreview What it the reason that this. test failures with --enable-preview? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23653#discussion_r1957804120 From epeter at openjdk.org Mon Feb 17 08:40:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Feb 2025 08:40:17 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> Message-ID: On Fri, 14 Feb 2025 16:52:17 GMT, Roland Westrelin wrote: > I suppose extracting the branch probability from the MethodData and attaching it to the Min/Max nodes is not impossible. That is basically what `PhaseIdealLoop::conditional_move` already does, right? It detects the diamond and converts it to `CMove`. We could special case for `min / max`, and then we'd have the probability for the branch, which we could store at the node. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2662409450 From roland at openjdk.org Mon Feb 17 08:47:22 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 17 Feb 2025 08:47:22 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> Message-ID: On Mon, 17 Feb 2025 08:37:56 GMT, Emanuel Peter wrote: > > I suppose extracting the branch probability from the MethodData and attaching it to the Min/Max nodes is not impossible. > > That is basically what `PhaseIdealLoop::conditional_move` already does, right? It detects the diamond and converts it to `CMove`. We could special case for `min / max`, and then we'd have the probability for the branch, which we could store at the node. Possibly. We could also create the intrinsic they way it's done in the patch and extract the frequency from the `MethoData` for the min or max methods. The shape of the bytecodes for these methods should be simple enough that it should be feasible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2662424292 From fgao at openjdk.org Mon Feb 17 08:55:09 2025 From: fgao at openjdk.org (Fei Gao) Date: Mon, 17 Feb 2025 08:55:09 GMT Subject: RFR: 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands [v2] In-Reply-To: <8t_Z2acW3fMegjh1OmqeEEEbZ9inBFkjyRKJvgpMewY=.5cf85cdd-8675-4ed0-b32c-4c65a685240f@github.com> References: <8t_Z2acW3fMegjh1OmqeEEEbZ9inBFkjyRKJvgpMewY=.5cf85cdd-8675-4ed0-b32c-4c65a685240f@github.com> Message-ID: <-WKEO4ufNtx6kggGxIwTuwXeFNl_yWvLpHuO6GfjT_Q=.f8a5a2cf-f5da-46a5-9cf2-6ff5fe1bccf2@github.com> On Tue, 11 Feb 2025 13:55:31 GMT, Andrew Haley wrote: > Is this still alive? Hi @theRealAph , sorry to reply late. Yes, it's still alive. I'm just back from my holiday. I'll resolve the conflict when the new commit is ready to be pushed. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22862#issuecomment-2662440724 From shade at openjdk.org Mon Feb 17 09:24:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 17 Feb 2025 09:24:30 GMT Subject: Integrated: 8350086: Inline hot Method accessors for faster task selection In-Reply-To: <2S92WMb5nqAG6LoBfpEmYf-0UubJpCAZ3XDUg2bKRos=.27a8beae-78b1-4934-84fd-f13cbad105f4@github.com> References: <2S92WMb5nqAG6LoBfpEmYf-0UubJpCAZ3XDUg2bKRos=.27a8beae-78b1-4934-84fd-f13cbad105f4@github.com> Message-ID: On Fri, 14 Feb 2025 14:47:23 GMT, Aleksey Shipilev wrote: > These methods show up prominently on Leyden profiles, as compilation policy asks these properties for methods very often during compile task selection: > - `Method::invocation_count` > - `Method::backedge_count` > - `Method::highest_comp_level` > > We can move the definitions for these methods to method.inline.hpp to make them eligible for better inlining. > > `interpreter_invocation_count()` method is a bit weird, looks like a leftover from [JDK-8251462](https://bugs.openjdk.org/browse/JDK-8251462). Removing it would prompt more cleanups and renamings in `ciMethod`, so I would leave it for future enhancement. > > Additional testing: > - [x] Spot-checked Leyden profiles, methods are now fully inlined into hot `CompilerBroker` methods > - [x] Ad-hoc Leyden benchmarks show minor improvements (< 1%) for time spent in compiler threads This pull request has now been integrated. Changeset: b1b48286 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/b1b48286a6cbee8a9f96d739ab437915c573022c Stats: 79 lines in 6 files changed: 41 ins; 33 del; 5 mod 8350086: Inline hot Method accessors for faster task selection Reviewed-by: kvn, coleenp, aph, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/23634 From epeter at openjdk.org Mon Feb 17 10:39:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Feb 2025 10:39:15 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> Message-ID: <5oGMaD5b87inAMkco6l5ODRvWv7FRsHGJiu_UMrGrTc=.0be44429-d322-4a6f-b91d-b64a146fad05@github.com> On Mon, 17 Feb 2025 08:44:46 GMT, Roland Westrelin wrote: >>> I suppose extracting the branch probability from the MethodData and attaching it to the Min/Max nodes is not impossible. >> >> That is basically what `PhaseIdealLoop::conditional_move` already does, right? It detects the diamond and converts it to `CMove`. We could special case for `min / max`, and then we'd have the probability for the branch, which we could store at the node. > >> > I suppose extracting the branch probability from the MethodData and attaching it to the Min/Max nodes is not impossible. >> >> That is basically what `PhaseIdealLoop::conditional_move` already does, right? It detects the diamond and converts it to `CMove`. We could special case for `min / max`, and then we'd have the probability for the branch, which we could store at the node. > > Possibly. We could also create the intrinsic they way it's done in the patch and extract the frequency from the `MethoData` for the min or max methods. The shape of the bytecodes for these methods should be simple enough that it should be feasible. @rwestrel @galderz > It seems overall, we likely win more than we loose with this intrinsic, so I would integrate this change as it is and file a bug to keep track of remaining issues. I'm a little scared to just accept the regressions, especially for this "most average looking case": Imagine you have an array with random numbers. Or at least numbers in a random order. If we take the max, then we expect the first number to be max with probability 1, the second 1/2, the third 1/3, the i'th 1/i. So the average branch probability is `n / (sum_i 1/i)`. This goes closer and closer to zero, the larger the array. This means that the "average" case has an extreme probability. And so if we do not vectorize, then this gets us a regression with the current patch. And vectorization is a little fragile, it only takes very little for vectorization not to kick in. > The Min/Max nodes are floating nodes. They can hoist out of loop and common reliably in ways that are not guaranteed otherwise. I suppose we could write an optimization that can hoist out loop independent if-diamonds out of a loop. If the condition and all phi inputs are loop invariant, you could just cut the diamond out of the loop, and paste it before the loop entry. > Shouldn't int min/max be affected the same way? I think we should be able to see the same issue here, actually. Yes. Here a quick benchmark below: java -XX:CompileCommand=compileonly,TestIntMax::test* -XX:CompileCommand=printcompilation,TestIntMax::test* -XX:+TraceNewVectors TestIntMax.java CompileCommand: compileonly TestIntMax.test* bool compileonly = true CompileCommand: PrintCompilation TestIntMax.test* bool PrintCompilation = true Warmup 5225 93 % 3 TestIntMax::test1 @ 5 (27 bytes) 5226 94 3 TestIntMax::test1 (27 bytes) 5226 95 % 4 TestIntMax::test1 @ 5 (27 bytes) 5238 96 4 TestIntMax::test1 (27 bytes) Run Time: 542056319 Warmup 6320 101 % 3 TestIntMax::test2 @ 5 (34 bytes) 6322 102 % 4 TestIntMax::test2 @ 5 (34 bytes) 6329 103 4 TestIntMax::test2 (34 bytes) Run Time: 166815209 That's a 4x regression on random input data! With: import java.util.Random; public class TestIntMax { private static Random RANDOM = new Random(); public static void main(String[] args) { int[] a = new int[64 * 1024]; for (int i = 0; i < a.length; i++) { a[i] = RANDOM.nextInt(); } { System.out.println("Warmup"); for (int i = 0; i < 10_000; i++) { test1(a); } System.out.println("Run"); long t0 = System.nanoTime(); for (int i = 0; i < 10_000; i++) { test1(a); } long t1 = System.nanoTime(); System.out.println("Time: " + (t1 - t0)); } { System.out.println("Warmup"); for (int i = 0; i < 10_000; i++) { test2(a); } System.out.println("Run"); long t0 = System.nanoTime(); for (int i = 0; i < 10_000; i++) { test2(a); } long t1 = System.nanoTime(); System.out.println("Time: " + (t1 - t0)); } } public static int test1(int[] a) { int x = Integer.MIN_VALUE; for (int i = 0; i < a.length; i++) { x = Math.max(x, a[i]); } return x; } public static int test2(int[] a) { int x = Integer.MIN_VALUE; for (int i = 0; i < a.length; i++) { x = (x >= a[i]) ? x : a[i]; } return x; } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2662706564 From roland at openjdk.org Mon Feb 17 10:50:22 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 17 Feb 2025 10:50:22 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: <5oGMaD5b87inAMkco6l5ODRvWv7FRsHGJiu_UMrGrTc=.0be44429-d322-4a6f-b91d-b64a146fad05@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> <5oGMaD5b87inAMkco6l5ODRvWv7FRsHGJiu_UMrGrTc=.0be44429-d322-4a6f-b91d-b64a146fad05@github.com> Message-ID: On Mon, 17 Feb 2025 10:36:52 GMT, Emanuel Peter wrote: > I suppose we could write an optimization that can hoist out loop independent if-diamonds out of a loop. If the condition and all phi inputs are loop invariant, you could just cut the diamond out of the loop, and paste it before the loop entry. Right. But, it would likely not optimize as well. The new optimization will possibly have heuristics to limit complexity so could be limited. The diamond could be transformed to something else by some other optimization before it gets a chance to get hoisted. There are likely other optimizations that apply to floating nodes that would still not apply: for instance, `MinL`/`MaxL` can be split thru phi even if the `min` call is not right after the merge point. With branches that's not true. Also, with more compexity comes more bugs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2662733218 From amitkumar at openjdk.org Mon Feb 17 11:08:24 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 17 Feb 2025 11:08:24 GMT Subject: RFR: 8350182: [s390x] Relativize locals in interpreter frames In-Reply-To: References: Message-ID: On Mon, 17 Feb 2025 09:53:37 GMT, Amit Kumar wrote: > Port for [JDK-8299795](https://bugs.openjdk.org/browse/JDK-8299795) Relativize Z_locals in interpreter frame for s390x. > > Tier1 test with fastdebug vm are clean. src/hotspot/cpu/s390/interp_masm_s390.cpp line 117: > 115: z_agr(Z_R1_scratch, Z_fp); > 116: > 117: z_cgr(Z_locals, Z_R1_scratch); By default assertion was failing, Because stored value is fp relativised and Z_locals is holding pointer. So I have updated it. I have done some manual step through and found that it should be okay to keep, unless we don't want to kill another register here. diff --git a/src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp b/src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp index c40be5edec7..fc5b9f10af1 100644 --- a/src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp +++ b/src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp @@ -1134,8 +1134,16 @@ void TemplateInterpreterGenerator::generate_fixed_frame(bool native_call) { __ z_agr(Z_locals, Z_esp); // z_ijava_state->locals - i*BytesPerWord points to i-th Java local (i starts at 0) // z_ijava_state->locals = Z_esp + parameter_count bytes + __ z_sgrk(Z_locals, Z_locals, fp); // Z_R1 = Z_locals - fp(); + __ z_srlg(Z_locals, Z_locals, Interpreter::logStackElementSize); + // Store relativized Z_locals, see frame::interpreter_frame_locals(). __ z_stg(Z_locals, _z_ijava_state_neg(locals), fp); // z_ijava_state->oop_temp = nullptr; __ store_const(Address(fp, oop_tmp_offset), 0); with this change, `Z_locals` is holding the correct value (fp relativized) which is stored at the state offset: (gdb) i r r12 r12 0x1 1 (gdb) x/2gx $r9 - 88 0x3fffb37be98: 0x0000000000000001 0x000003fffb37be80 (gdb) Similarly, with the change I have pushed: instr: 0x3fff8192b5a: lg %r1,-88(%r9) 0x3fff8192b60: sllg %r1,%r1,3 0x3fff8192b66: agr %r1,%r9 result: (gdb) i r r12 r12 0x3fffb37bef8 4397966278392 (gdb) i r r1 r1 0x3fffb37bef8 4397966278392 (gdb) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23660#discussion_r1957988781 From amitkumar at openjdk.org Mon Feb 17 11:08:23 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 17 Feb 2025 11:08:23 GMT Subject: RFR: 8350182: [s390x] Relativize locals in interpreter frames Message-ID: Port for [JDK-8299795](https://bugs.openjdk.org/browse/JDK-8299795) Relativize Z_locals in interpreter frame for s390x. Tier1 test with fastdebug vm are clean. ------------- Commit messages: - locals relativization on s390x Changes: https://git.openjdk.org/jdk/pull/23660/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23660&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350182 Stats: 31 lines in 5 files changed: 24 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23660.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23660/head:pull/23660 PR: https://git.openjdk.org/jdk/pull/23660 From dholmes at openjdk.org Mon Feb 17 11:29:16 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 17 Feb 2025 11:29:16 GMT Subject: RFR: 8342769: HotSpot Windows/gcc port is broken [v16] In-Reply-To: References: Message-ID: On Mon, 17 Feb 2025 08:27:56 GMT, Julian Waters wrote: >> Several areas in HotSpot are broken in the gcc port. These, with the exception of 1 rather big oversight within SharedRuntime::frem and SharedRuntime::drem, are all minor correctness issues within the code. These mostly can be fixed with simple changes to the code. Note that I am not sure whether the SharedRuntime::frem and SharedRuntime::drem fix is correct. It may be that they can be removed entirely > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: > > - CAST_FROM_FN_PTR in os_windows.cpp > - Merge branch 'master' into hotspot > - Merge branch 'openjdk:master' into hotspot > - _WINDOWS && AARCH64 in sharedRuntime.hpp > - AARCH64 in sharedRuntimeRem.cpp > - Refactor sharedRuntime.cpp > - CAST_FROM_FN_PTR in os_windows.cpp > - Merge branch 'openjdk:master' into hotspot > - fmod_winarm64 in sharedRuntime.cpp > - fmod_winarm64 in sharedRuntimeRem.cpp > - ... and 19 more: https://git.openjdk.org/jdk/compare/5e9d72e2...3f9ca206 Latest changes seem fine too. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21627#pullrequestreview-2620782920 From rrich at openjdk.org Mon Feb 17 11:30:13 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 17 Feb 2025 11:30:13 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v2] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Fri, 14 Feb 2025 22:38:23 GMT, Dean Long wrote: > > I think you can make the assertion a little stricter like this [reinrich at 9c3c8a3](https://github.com/reinrich/jdk/commit/9c3c8a33a29b9ae6c4c703992b306dc0cbbcd2f0). > > Regarding this stricter version, why are you using is_bottom_frame instead of is_top_frame? The deoptimization code seems to name the most recent leaf frame "top". That sounds like what frame::top_ijava_frame_abi_size is for too. Correct, the top frame has a frame::top_ijava_frame_abi but the assertion is about the abi section in the current frame's caller and the the bottom frame's caller also has a top_ijava_frame_abi because i2c doesn't modify it. Continue reading if you're interested in more details... As said the i2c adapter does *not* trimm the caller frame as the interpreter would, replacing its large `top_ijava_frame_abi` with a smaller `parent_ijava_frame_abi`. Example: compiled frame DEOPTEE is replaced with 3 interpreted frames Stack before deoptimization | | | Interpreted CALLER | | of DEOPTEE frame | | | +------------------------+ | | | top_ijava_frame_abi | | | +========================+ | | | Compiled | | DEOPTEE | | | +------------------------+ | java_abi | +========================+ Stack when assertion is checked (i.e. after DEOPTEE was replaced by corresponding inter. frames) | | | Interpreted CALLER | | of DEOPTEE frame | | | +------------------------+ | | | top_ijava_frame_abi | <- i2c keeps large abi | | +========================+ | | <- bottom frame | Interpreted Frame 0 | | corresp. to DEOPTEE | | | +------------------------+ | parent_ijava_frame_abi | +========================+ | | | Interpreted Frame 1 | | (inlined by DEOPTEE) | | | +------------------------+ | parent_ijava_frame_abi | +========================+ | | <- top frame | Interpreted Frame 2 | | (inlined by DEOPTEE) | | | +------------------------+ | | | top_ijava_frame_abi | | | +========================+ Notes: (refering to the frame sections rather than the C++ types) - top_ijava_frame_abi complies to the native abi (modelled by frame::native_abi_reg_args). This is needed for VM calls. - parent_ijava_frame_abi is equal to frame::java_abi. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2662835374 From mli at openjdk.org Mon Feb 17 13:56:13 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Feb 2025 13:56:13 GMT Subject: RFR: 8350093: RISC-V: java/math/BigInteger/LargeValueExceptions.java timeout with COH In-Reply-To: <20dDUIvvN45lILuiZ1hdaOXnRDTNl1V2nKI4X1S1lPE=.f615a14e-0518-4177-ac47-d9b8fd222d2b@github.com> References: <20dDUIvvN45lILuiZ1hdaOXnRDTNl1V2nKI4X1S1lPE=.f615a14e-0518-4177-ac47-d9b8fd222d2b@github.com> Message-ID: On Fri, 14 Feb 2025 13:21:58 GMT, Fei Yang wrote: > Hi, please review this change resolving a timeout issue in `LargeValueExceptions.squareDefiniteOverflow()`. > > This issue only happens on platforms with slow unaligned memory accesses like Unmatched or Premier-P550 SBCs. > Async profiler shows major time was spent in multiplyToLen stub code. When AvoidUnalignedAccesses is enabled, > there is a simple alignment check, which assumes 8-byte alignment for base_offset of int arrays. But this is > not the case with COH: base_offset is 12 bytes instead of 16 bytes for int arrays. > > Patch simply makes it explicit about the requirement of base_offset. Sanity tested on Premier P550. > No obvious change witnessed on JMH after this change: > > ----------------------------------------------------------------------------------------------- > > Without COH: > > Benchmark (maxNumbits) Mode Cnt Score Error Units > BigIntegers.SmallShifts.testLeftShift 32 avgt 15 138.939 ? 2.246 ns/op > BigIntegers.SmallShifts.testLeftShift 128 avgt 15 88.391 ? 1.210 ns/op > BigIntegers.SmallShifts.testLeftShift 256 avgt 15 117.590 ? 1.398 ns/op > BigIntegers.SmallShifts.testRightShift 32 avgt 15 150.338 ? 1.961 ns/op > BigIntegers.SmallShifts.testRightShift 128 avgt 15 104.540 ? 5.636 ns/op > BigIntegers.SmallShifts.testRightShift 256 avgt 15 126.082 ? 1.756 ns/op > BigIntegers.testAdd N/A avgt 15 97.513 ? 40.746 ns/op > BigIntegers.testGcd N/A avgt 15 5409222.706 ? 5934.667 ns/op > BigIntegers.testHugeLargeDivide N/A avgt 15 246.904 ? 1.552 ns/op > BigIntegers.testHugeSmallDivide N/A avgt 15 248.997 ? 1.374 ns/op > BigIntegers.testHugeToString N/A avgt 15 2421.432 ? 62.208 ns/op > BigIntegers.testLargeSmallDivide N/A avgt 15 216.859 ? 1.760 ns/op > BigIntegers.testLargeToString N/A avgt 15 425.653 ? 13.305 ns/op > BigIntegers.testLeftShift N/A avgt 15 2265.137 ? 24.319 ns/op > BigIntegers.testMultiply N/A avgt 15 15862.412 ? 417.880 ns/op <======== > BigIntegers.testRightShift N/A avgt 15 936.071 ? 15.247 ns/op > BigIntegers.testSmallToString N/A avgt 15 322.350 ? 16.075... Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23631#pullrequestreview-2621141870 From mli at openjdk.org Mon Feb 17 13:56:14 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Feb 2025 13:56:14 GMT Subject: RFR: 8350093: RISC-V: java/math/BigInteger/LargeValueExceptions.java timeout with COH In-Reply-To: References: <20dDUIvvN45lILuiZ1hdaOXnRDTNl1V2nKI4X1S1lPE=.f615a14e-0518-4177-ac47-d9b8fd222d2b@github.com> Message-ID: On Fri, 14 Feb 2025 16:00:40 GMT, Fei Yang wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 5486: >> >>> 5484: const Register jdx = tmp1; >>> 5485: >>> 5486: if (AvoidUnalignedAccesses) { >> >> If `AvoidUnalignedAccesses == false`, it will go through all alignment code? But seems original code will not go through this alignment when `AvoidUnalignedAccesses == false`. > > Hi, not sure if I understand the question correctly. This only affects platforms where `AvoidUnalignedAccesses` is true. > It does not make a difference on platforms with fast misaligned accesses (which means `AvoidUnalignedAccesses == false`). Ah, I think I red the code wrong, it's good. I thought you closed the curly brace earlier than before. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23631#discussion_r1958279785 From jsjolen at openjdk.org Mon Feb 17 14:04:12 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 17 Feb 2025 14:04:12 GMT Subject: RFR: 8344009: Improve compiler memory statistics In-Reply-To: <6IATWzJgb5zFVmIcXhH3XFoVOeU1RxinjTPIvhm4vL0=.f2d9e94d-e0f9-40ad-b843-25defa3c3b09@github.com> References: <6IATWzJgb5zFVmIcXhH3XFoVOeU1RxinjTPIvhm4vL0=.f2d9e94d-e0f9-40ad-b843-25defa3c3b09@github.com> Message-ID: On Fri, 14 Feb 2025 06:37:55 GMT, Thomas Stuefe wrote: > We also save a copy of the counters to a global table that contains the N most expensive compilations. That table will be printed when one uses jcmd Compiler.memory. We also print it into the hs-err file. This is a new tool for me, but I'd appreciate it if there was the equivalent of `PrintNMTStatistics` such that the table produced from the JCmd is also printed on shutdown. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2663222964 From sroy at openjdk.org Mon Feb 17 14:05:12 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Mon, 17 Feb 2025 14:05:12 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v23] In-Reply-To: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: <5xeRqXJYlOXFs4jAAXJaf_i0Vn7phluw1j-rNPvZakc=.5c60cc39-28b4-4808-9a1c-8a4e318cd5ed@github.com> > JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) > > Currently acceleration code for GHASH is missing for PPC64. > > The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: Single load inside loop ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20235/files - new: https://git.openjdk.org/jdk/pull/20235/files/a7d9a960..5b94a7a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=21-22 Stats: 260 lines in 1 file changed: 108 ins; 99 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/20235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20235/head:pull/20235 PR: https://git.openjdk.org/jdk/pull/20235 From duke at openjdk.org Mon Feb 17 14:10:47 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 17 Feb 2025 14:10:47 GMT Subject: RFR: 8349721: Add aarch64 intrinsics for ML-KEM Message-ID: By using the aarch64 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. ------------- Commit messages: - removing trailing spaces - kyber aarch64 intrinsics Changes: https://git.openjdk.org/jdk/pull/23663/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23663&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349721 Stats: 2885 lines in 20 files changed: 2774 ins; 84 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/23663.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23663/head:pull/23663 PR: https://git.openjdk.org/jdk/pull/23663 From roland at openjdk.org Mon Feb 17 14:19:12 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 17 Feb 2025 14:19:12 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 14:40:09 GMT, Emanuel Peter wrote: > Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below. > > **Background** > > With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer. > > **Problem** > > So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code. > > > MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1); > MemorySegment nativeUnaligned = nativeAligned.asSlice(1); > test3(nativeUnaligned); > > > When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not! > > static void test3(MemorySegment ms) { > for (int i = 0; i < RANGE; i++) { > long adr = i * 4L; > int v = ms.get(ELEMENT_LAYOUT, adr); > ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1)); > } > } > > > **Solution: Runtime Checks - Predicate and Multiversioning** > > Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check. > > I came up with 2 options where to place the runtime checks: > - A new "auto vectorization" Parse Predicate: > - This only works when predicates are available. > - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop. > - Multiversion the loop: > - Create 2 copies of the loop (fast and slow loops). > - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take > - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even unaligned `base`s would end up with reasonably fast code. > - We "stall" the `... What are the architectures affected by this? Isn't it the case that x86 and aarch64 are unaffected by this? Is the motivation to use this as a way to do prep work for alias analysis? Do you intend to use a single deoptimization reason for all vectorization related predicates? (that is when you take care of aliasing, are you going to to use the same reason for aliasing and alignment checks) I went over the code and it looks reasonable to me. I intend to do a more careful review later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2663262133 From fjiang at openjdk.org Mon Feb 17 14:47:09 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 17 Feb 2025 14:47:09 GMT Subject: RFR: 8350093: RISC-V: java/math/BigInteger/LargeValueExceptions.java timeout with COH In-Reply-To: <20dDUIvvN45lILuiZ1hdaOXnRDTNl1V2nKI4X1S1lPE=.f615a14e-0518-4177-ac47-d9b8fd222d2b@github.com> References: <20dDUIvvN45lILuiZ1hdaOXnRDTNl1V2nKI4X1S1lPE=.f615a14e-0518-4177-ac47-d9b8fd222d2b@github.com> Message-ID: On Fri, 14 Feb 2025 13:21:58 GMT, Fei Yang wrote: > Hi, please review this change resolving a timeout issue in `LargeValueExceptions.squareDefiniteOverflow()`. > > This issue only happens on platforms with slow unaligned memory accesses like Unmatched or Premier-P550 SBCs. > Async profiler shows major time was spent in multiplyToLen stub code. When AvoidUnalignedAccesses is enabled, > there is a simple alignment check, which assumes 8-byte alignment for base_offset of int arrays. But this is > not the case with COH: base_offset is 12 bytes instead of 16 bytes for int arrays. > > Patch simply makes it explicit about the requirement of base_offset. Sanity tested on Premier P550. > No obvious change witnessed on JMH after this change: > > ----------------------------------------------------------------------------------------------- > > Without COH: > > Benchmark (maxNumbits) Mode Cnt Score Error Units > BigIntegers.SmallShifts.testLeftShift 32 avgt 15 138.939 ? 2.246 ns/op > BigIntegers.SmallShifts.testLeftShift 128 avgt 15 88.391 ? 1.210 ns/op > BigIntegers.SmallShifts.testLeftShift 256 avgt 15 117.590 ? 1.398 ns/op > BigIntegers.SmallShifts.testRightShift 32 avgt 15 150.338 ? 1.961 ns/op > BigIntegers.SmallShifts.testRightShift 128 avgt 15 104.540 ? 5.636 ns/op > BigIntegers.SmallShifts.testRightShift 256 avgt 15 126.082 ? 1.756 ns/op > BigIntegers.testAdd N/A avgt 15 97.513 ? 40.746 ns/op > BigIntegers.testGcd N/A avgt 15 5409222.706 ? 5934.667 ns/op > BigIntegers.testHugeLargeDivide N/A avgt 15 246.904 ? 1.552 ns/op > BigIntegers.testHugeSmallDivide N/A avgt 15 248.997 ? 1.374 ns/op > BigIntegers.testHugeToString N/A avgt 15 2421.432 ? 62.208 ns/op > BigIntegers.testLargeSmallDivide N/A avgt 15 216.859 ? 1.760 ns/op > BigIntegers.testLargeToString N/A avgt 15 425.653 ? 13.305 ns/op > BigIntegers.testLeftShift N/A avgt 15 2265.137 ? 24.319 ns/op > BigIntegers.testMultiply N/A avgt 15 15862.412 ? 417.880 ns/op <======== > BigIntegers.testRightShift N/A avgt 15 936.071 ? 15.247 ns/op > BigIntegers.testSmallToString N/A avgt 15 322.350 ? 16.075... Looks good, thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/23631#pullrequestreview-2621266093 From roland at openjdk.org Mon Feb 17 15:05:17 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 17 Feb 2025 15:05:17 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: <5oGMaD5b87inAMkco6l5ODRvWv7FRsHGJiu_UMrGrTc=.0be44429-d322-4a6f-b91d-b64a146fad05@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> <5oGMaD5b87inAMkco6l5ODRvWv7FRsHGJiu_UMrGrTc=.0be44429-d322-4a6f-b91d-b64a146fad05@github.com> Message-ID: <3ArmrOQcUoj8DhHTq1a40Oz3GE8bCDDy3FFeVgbladg=.b8e0e13b-39f3-41a6-8a1b-5ca4febb4a41@github.com> On Mon, 17 Feb 2025 10:36:52 GMT, Emanuel Peter wrote: > I think we should be able to see the same issue here, actually. Yes. Here a quick benchmark below: I observe the same: Warmup 751 3 b TestIntMax::test1 (27 bytes) Run Time: 360 550 158 Warmup 1862 15 b TestIntMax::test2 (34 bytes) Run Time: 92 116 170 But then with this: diff --git a/src/hotspot/cpu/x86/x86_64.ad b/src/hotspot/cpu/x86/x86_64.ad index 8cc4a970bfd..9abda8f4178 100644 --- a/src/hotspot/cpu/x86/x86_64.ad +++ b/src/hotspot/cpu/x86/x86_64.ad @@ -12037,16 +12037,20 @@ instruct cmovI_reg_l(rRegI dst, rRegI src, rFlagsReg cr) %} -instruct maxI_rReg(rRegI dst, rRegI src) +instruct maxI_rReg(rRegI dst, rRegI src, rFlagsReg cr) %{ match(Set dst (MaxI dst src)); + effect(KILL cr); ins_cost(200); - expand %{ - rFlagsReg cr; - compI_rReg(cr, dst, src); - cmovI_reg_l(dst, src, cr); + ins_encode %{ + Label done; + __ cmpl($src$$Register, $dst$$Register); + __ jccb(Assembler::less, done); + __ mov($dst$$Register, $src$$Register); + __ bind(done); %} + ins_pipe(pipe_cmov_reg); %} // ============================================================================ the performance gap narrows: Warmup 770 3 b TestIntMax::test1 (27 bytes) Run Time: 94 951 677 Warmup 1312 15 b TestIntMax::test2 (34 bytes) Run Time: 70 053 824 (the number of test2 fluctuates quite a bit). Does it ever make sense to implement `MaxI` with a conditional move then? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2663379660 From epeter at openjdk.org Mon Feb 17 15:28:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 17 Feb 2025 15:28:13 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Mon, 17 Feb 2025 14:16:59 GMT, Roland Westrelin wrote: >> Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below. >> >> **Background** >> >> With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer. >> >> **Problem** >> >> So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code. >> >> >> MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1); >> MemorySegment nativeUnaligned = nativeAligned.asSlice(1); >> test3(nativeUnaligned); >> >> >> When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not! >> >> static void test3(MemorySegment ms) { >> for (int i = 0; i < RANGE; i++) { >> long adr = i * 4L; >> int v = ms.get(ELEMENT_LAYOUT, adr); >> ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1)); >> } >> } >> >> >> **Solution: Runtime Checks - Predicate and Multiversioning** >> >> Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check. >> >> I came up with 2 options where to place the runtime checks: >> - A new "auto vectorization" Parse Predicate: >> - This only works when predicates are available. >> - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop. >> - Multiversion the loop: >> - Create 2 copies of the loop (fast and slow loops). >> - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take >> - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even ... > > What are the architectures affected by this? Isn't it the case that x86 and aarch64 are unaffected by this? Is the motivation to use this as a way to do prep work for alias analysis? > > Do you intend to use a single deoptimization reason for all vectorization related predicates? (that is when you take care of aliasing, are you going to to use the same reason for aliasing and alignment checks) > > I went over the code and it looks reasonable to me. I intend to do a more careful review later. @rwestrel Thanks for having a first look! > What are the architectures affected by this? Isn't it the case that x86 and aarch64 are unaffected by this? Yes, x86 and aarch64 are unaffected, as far as I know. Well, we can simulate strict alignment with `-XX:+AlignVector`, and there it should behave correctly, and it currently fails with the `-XX:+VerifyAlignVector`. It would be nice if that was not the case, so that we can write tests with arbitrary alignment, and turn on those flags freely. > Is the motivation to use this as a way to do prep work for alias analysis? I see this as a bug-fix AND preparation for future work. I suppose I might not have fixed this bug here since our platforms are not really affected, but I might as well fix it now since I can re-use most of the code later. > Do you intend to use a single deoptimization reason for all vectorization related predicates? (that is when you take care of aliasing, are you going to to use the same reason for aliasing and alignment checks) I suppose that is currently what I'm planning. But we could in principle separate them. But I would leave that for later, if there is any desire to do that. For now, I think it's ok to just go with a single "auto-vectorization" reason. Does that sound reasonable? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2663434802 From galder at openjdk.org Mon Feb 17 16:49:15 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 17 Feb 2025 16:49:15 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: <3ArmrOQcUoj8DhHTq1a40Oz3GE8bCDDy3FFeVgbladg=.b8e0e13b-39f3-41a6-8a1b-5ca4febb4a41@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> <5oGMaD5b87inAMkco6l5ODRvWv7FRsHGJiu_UMrGrTc=.0be44429-d322-4a6f-b91d-b64a146fad05@github.com> <3ArmrOQcUoj8DhHTq1a40Oz3GE8bCDDy3FF eVgbladg=.b8e0e13b-39f3-41a6-8a1b-5ca4febb4a41@github.com> Message-ID: <_SUoth7bTq41M5TpGjQ5ADL2TOesK2tIIxmL21BZ6RU=.65284948-b4a8-4d01-a924-e9dfeefe1c88@github.com> On Mon, 17 Feb 2025 15:02:32 GMT, Roland Westrelin wrote: >> @rwestrel @galderz >> >>> It seems overall, we likely win more than we loose with this intrinsic, so I would integrate this change as it is and file a bug to keep track of remaining issues. >> >> I'm a little scared to just accept the regressions, especially for this "most average looking case": >> Imagine you have an array with random numbers. Or at least numbers in a random order. If we take the max, then we expect the first number to be max with probability 1, the second 1/2, the third 1/3, the i'th 1/i. So the average branch probability is `n / (sum_i 1/i)`. This goes closer and closer to zero, the larger the array. This means that the "average" case has an extreme probability. And so if we do not vectorize, then this gets us a regression with the current patch. And vectorization is a little fragile, it only takes very little for vectorization not to kick in. >> >>> The Min/Max nodes are floating nodes. They can hoist out of loop and common reliably in ways that are not guaranteed otherwise. >> >> I suppose we could write an optimization that can hoist out loop independent if-diamonds out of a loop. If the condition and all phi inputs are loop invariant, you could just cut the diamond out of the loop, and paste it before the loop entry. >> >>> Shouldn't int min/max be affected the same way? >> >> I think we should be able to see the same issue here, actually. Yes. Here a quick benchmark below: >> >> >> java -XX:CompileCommand=compileonly,TestIntMax::test* -XX:CompileCommand=printcompilation,TestIntMax::test* -XX:+TraceNewVectors TestIntMax.java >> CompileCommand: compileonly TestIntMax.test* bool compileonly = true >> CompileCommand: PrintCompilation TestIntMax.test* bool PrintCompilation = true >> Warmup >> 5225 93 % 3 TestIntMax::test1 @ 5 (27 bytes) >> 5226 94 3 TestIntMax::test1 (27 bytes) >> 5226 95 % 4 TestIntMax::test1 @ 5 (27 bytes) >> 5238 96 4 TestIntMax::test1 (27 bytes) >> Run >> Time: 542056319 >> Warmup >> 6320 101 % 3 TestIntMax::test2 @ 5 (34 bytes) >> 6322 102 % 4 TestIntMax::test2 @ 5 (34 bytes) >> 6329 103 4 TestIntMax::test2 (34 bytes) >> Run >> Time: 166815209 >> >> That's a 4x regression on random input data! >> >> With: >> >> import java.util.Random; >> >> public class TestIntMax { >> private static Random RANDOM = new Random(); >> >> public static void main(String[] args) { >> int[] a = new int[64 * 1024]; >> for (int i = 0; i < a.length; i++) { >>... > >> I think we should be able to see the same issue here, actually. Yes. Here a quick benchmark below: > > I observe the same: > > > Warmup > 751 3 b TestIntMax::test1 (27 bytes) > Run > Time: 360 550 158 > Warmup > 1862 15 b TestIntMax::test2 (34 bytes) > Run > Time: 92 116 170 > > > But then with this: > > > diff --git a/src/hotspot/cpu/x86/x86_64.ad b/src/hotspot/cpu/x86/x86_64.ad > index 8cc4a970bfd..9abda8f4178 100644 > --- a/src/hotspot/cpu/x86/x86_64.ad > +++ b/src/hotspot/cpu/x86/x86_64.ad > @@ -12037,16 +12037,20 @@ instruct cmovI_reg_l(rRegI dst, rRegI src, rFlagsReg cr) > %} > > > -instruct maxI_rReg(rRegI dst, rRegI src) > +instruct maxI_rReg(rRegI dst, rRegI src, rFlagsReg cr) > %{ > match(Set dst (MaxI dst src)); > + effect(KILL cr); > > ins_cost(200); > - expand %{ > - rFlagsReg cr; > - compI_rReg(cr, dst, src); > - cmovI_reg_l(dst, src, cr); > + ins_encode %{ > + Label done; > + __ cmpl($src$$Register, $dst$$Register); > + __ jccb(Assembler::less, done); > + __ mov($dst$$Register, $src$$Register); > + __ bind(done); > %} > + ins_pipe(pipe_cmov_reg); > %} > > // ============================================================================ > > > the performance gap narrows: > > > Warmup > 770 3 b TestIntMax::test1 (27 bytes) > Run > Time: 94 951 677 > Warmup > 1312 15 b TestIntMax::test2 (34 bytes) > Run > Time: 70 053 824 > > > (the number of test2 fluctuates quite a bit). Does it ever make sense to implement `MaxI` with a conditional move then? @rwestrel @eme64 I think that the data distribution in the `TestIntMax` above matters (see my explanations in https://github.com/openjdk/jdk/pull/20098#issuecomment-2642788364), so I've enhanced the test to control data distribution in the int[] (see at the bottom). Here are the results I see on my AVX-512 machine: Probability: 50% Warmup 7834 92 % b 3 TestIntMax::test1 @ 5 (27 bytes) 7836 93 b 3 TestIntMax::test1 (27 bytes) 7838 94 % b 4 TestIntMax::test1 @ 5 (27 bytes) 7851 95 b 4 TestIntMax::test1 (27 bytes) Run Time: 699 923 014 Warmup 9272 96 % b 3 TestIntMax::test2 @ 5 (34 bytes) 9274 97 b 3 TestIntMax::test2 (34 bytes) 9275 98 % b 4 TestIntMax::test2 @ 5 (34 bytes) 9287 99 b 4 TestIntMax::test2 (34 bytes) Run Time: 699 815 792 Probability: 80% Warmup 7872 92 % b 3 TestIntMax::test1 @ 5 (27 bytes) 7874 93 b 3 TestIntMax::test1 (27 bytes) 7875 94 % b 4 TestIntMax::test1 @ 5 (27 bytes) 7889 95 b 4 TestIntMax::test1 (27 bytes) Run Time: 699 947 633 Warmup 9310 96 % b 3 TestIntMax::test2 @ 5 (34 bytes) 9311 97 b 3 TestIntMax::test2 (34 bytes) 9312 98 % b 4 TestIntMax::test2 @ 5 (34 bytes) 9325 99 b 4 TestIntMax::test2 (34 bytes) Run Time: 699 827 882 Probability: 100% Warmup 7884 92 % b 3 TestIntMax::test1 @ 5 (27 bytes) 7886 93 b 3 TestIntMax::test1 (27 bytes) 7888 94 % b 4 TestIntMax::test1 @ 5 (27 bytes) 7901 95 b 4 TestIntMax::test1 (27 bytes) Run Time: 699 931 243 Warmup 9322 96 % b 3 TestIntMax::test2 @ 5 (34 bytes) 9323 97 b 3 TestIntMax::test2 (34 bytes) 9324 98 % b 4 TestIntMax::test2 @ 5 (34 bytes) 9336 99 b 4 TestIntMax::test2 (34 bytes) Run Time: 1 077 937 282 import java.util.Random; import java.util.concurrent.ThreadLocalRandom; import java.text.DecimalFormat; import java.text.DecimalFormatSymbols; class TestIntMax { static final int RANGE = 16 * 1024; static final int ITER = 100_000; public static void main(String[] args) { final int probability = Integer.parseInt(args[0]); final DecimalFormatSymbols symbols = new DecimalFormatSymbols(); symbols.setGroupingSeparator(' '); final DecimalFormat format = new DecimalFormat("#,###", symbols); System.out.printf("Probability: %d%%%n", probability); int[] a = new int[64 * 1024]; init(a, probability); { System.out.println("Warmup"); for (int i = 0; i < 10_000; i++) { test1(a); } System.out.println("Run"); long t0 = System.nanoTime(); for (int i = 0; i < 10_000; i++) { test1(a); } long t1 = System.nanoTime(); System.out.println("Time: " + format.format(t1 - t0)); } { System.out.println("Warmup"); for (int i = 0; i < 10_000; i++) { test2(a); } System.out.println("Run"); long t0 = System.nanoTime(); for (int i = 0; i < 10_000; i++) { test2(a); } long t1 = System.nanoTime(); System.out.println("Time: " + format.format(t1 - t0)); } } public static int test1(int[] a) { int x = Integer.MIN_VALUE; for (int i = 0; i < a.length; i++) { x = Math.max(x, a[i]); } return x; } public static int test2(int[] a) { int x = Integer.MIN_VALUE; for (int i = 0; i < a.length; i++) { x = (x >= a[i]) ? x : a[i]; } return x; } public static void init(int[] ints, int probability) { int aboveCount, abovePercent; do { int max = ThreadLocalRandom.current().nextInt(10); ints[0] = max; aboveCount = 0; for (int i = 1; i < ints.length; i++) { int value; if (ThreadLocalRandom.current().nextInt(101) <= probability) { int increment = ThreadLocalRandom.current().nextInt(10); value = max + increment; aboveCount++; } else { // Decrement by at least 1 int decrement = ThreadLocalRandom.current().nextInt(10) + 1; value = max - decrement; } ints[i] = value; max = Math.max(max, value); } abovePercent = ((aboveCount + 1) * 100) / ints.length; } while (abovePercent != probability); } } Focusing my comment below on 100% which is where the differences appear: test2 (100%): ;; B12: # out( B21 B13 ) <- in( B11 B20 ) Freq: 1.6744e+09 0x00007f15bcada2e9: movl 0x14(%rsi, %rdx, 4), %r11d ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - TestIntMax::test2 at 14 (line 71) 0x00007f15bcada2ee: cmpl %r11d, %r10d 0x00007f15bcada2f1: jge 0x7f15bcada362 ;*istore_1 {reexecute=0 rethrow=0 return_oop=0} ; - TestIntMax::test2 at 25 (line 71) test1 (100%) ;; B10: # out( B10 B11 ) <- in( B9 B10 ) Loop( B10-B10 inner main of N64 strip mined) Freq: 1.6744e+09 0x00007f15bcad9a70: movl 0x4c(%rsi, %rdx, 4), %r11d 0x00007f15bcad9a75: movl %r11d, (%rsp) 0x00007f15bcad9a79: movl 0x48(%rsi, %rdx, 4), %r10d 0x00007f15bcad9a7e: movl %r10d, 4(%rsp) 0x00007f15bcad9a83: movl 0x10(%rsi, %rdx, 4), %r11d 0x00007f15bcad9a88: movl 0x14(%rsi, %rdx, 4), %r9d 0x00007f15bcad9a8d: movl 0x44(%rsi, %rdx, 4), %r10d 0x00007f15bcad9a92: movl %r10d, 8(%rsp) 0x00007f15bcad9a97: movl 0x18(%rsi, %rdx, 4), %r8d 0x00007f15bcad9a9c: cmpl %r11d, %eax 0x00007f15bcad9a9f: cmovll %r11d, %eax 0x00007f15bcad9aa3: cmpl %r9d, %eax 0x00007f15bcad9aa6: cmovll %r9d, %eax 0x00007f15bcad9aaa: movl 0x20(%rsi, %rdx, 4), %r10d 0x00007f15bcad9aaf: cmpl %r8d, %eax 0x00007f15bcad9ab2: cmovll %r8d, %eax 0x00007f15bcad9ab6: movl 0x24(%rsi, %rdx, 4), %r8d 0x00007f15bcad9abb: movl 0x28(%rsi, %rdx, 4), %r11d ; {no_reloc} 0x00007f15bcad9ac0: movl 0x2c(%rsi, %rdx, 4), %ecx 0x00007f15bcad9ac4: movl 0x30(%rsi, %rdx, 4), %r9d 0x00007f15bcad9ac9: movl 0x34(%rsi, %rdx, 4), %edi 0x00007f15bcad9acd: movl 0x38(%rsi, %rdx, 4), %ebx 0x00007f15bcad9ad1: movl 0x3c(%rsi, %rdx, 4), %ebp 0x00007f15bcad9ad5: movl 0x40(%rsi, %rdx, 4), %r13d 0x00007f15bcad9ada: movl 0x1c(%rsi, %rdx, 4), %r14d 0x00007f15bcad9adf: cmpl %r14d, %eax 0x00007f15bcad9ae2: cmovll %r14d, %eax 0x00007f15bcad9ae6: cmpl %r10d, %eax 0x00007f15bcad9ae9: cmovll %r10d, %eax 0x00007f15bcad9aed: cmpl %r8d, %eax 0x00007f15bcad9af0: cmovll %r8d, %eax 0x00007f15bcad9af4: cmpl %r11d, %eax 0x00007f15bcad9af7: cmovll %r11d, %eax 0x00007f15bcad9afb: cmpl %ecx, %eax 0x00007f15bcad9afd: cmovll %ecx, %eax 0x00007f15bcad9b00: cmpl %r9d, %eax 0x00007f15bcad9b03: cmovll %r9d, %eax 0x00007f15bcad9b07: cmpl %edi, %eax 0x00007f15bcad9b09: cmovll %edi, %eax 0x00007f15bcad9b0c: cmpl %ebx, %eax 0x00007f15bcad9b0e: cmovll %ebx, %eax 0x00007f15bcad9b11: cmpl %ebp, %eax 0x00007f15bcad9b13: cmovll %ebp, %eax 0x00007f15bcad9b16: cmpl %r13d, %eax 0x00007f15bcad9b19: cmovll %r13d, %eax 0x00007f15bcad9b1d: cmpl 8(%rsp), %eax 0x00007f15bcad9b21: movl 8(%rsp), %r11d 0x00007f15bcad9b26: cmovll %r11d, %eax 0x00007f15bcad9b2a: cmpl 4(%rsp), %eax 0x00007f15bcad9b2e: movl 4(%rsp), %r10d 0x00007f15bcad9b33: cmovll %r10d, %eax 0x00007f15bcad9b37: cmpl (%rsp), %eax 0x00007f15bcad9b3a: movl (%rsp), %r11d 0x00007f15bcad9b3e: cmovll %r11d, %eax ;*invokestatic max {reexecute=0 rethrow=0 return_oop=0} ; - TestIntMax::test1 at 15 (line 61) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2663633050 From galder at openjdk.org Mon Feb 17 17:05:28 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 17 Feb 2025 17:05:28 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 7 Feb 2025 12:39:24 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 44 additional commits since the last revision: > > - Merge branch 'master' into topic.intrinsify-max-min-long > - Fix typo > - Renaming methods and variables and add docu on algorithms > - Fix copyright years > - Make sure it runs with cpus with either avx512 or asimd > - Test can only run with 256 bit registers or bigger > > * Remove platform dependant check > and use platform independent configuration instead. > - Fix license header > - Tests should also run on aarch64 asimd=true envs > - Added comment around the assertions > - Adjust min/max identity IR test expectations after changes > - ... and 34 more: https://git.openjdk.org/jdk/compare/ba549afe...a190ae68 Another interesting comparison arises above when comparing `test2` in 80% vs 100%: test2 (100%): ;; B12: # out( B21 B13 ) <- in( B11 B20 ) Freq: 1.6744e+09 0x00007f15bcada2e9: movl 0x14(%rsi, %rdx, 4), %r11d ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - TestIntMax::test2 at 14 (line 71) 0x00007f15bcada2ee: cmpl %r11d, %r10d 0x00007f15bcada2f1: jge 0x7f15bcada362 ;*istore_1 {reexecute=0 rethrow=0 return_oop=0} ; - TestIntMax::test2 at 25 (line 71) test2(80%): ;; B10: # out( B10 B11 ) <- in( B9 B10 ) Loop( B10-B10 inner main of N64 strip mined) Freq: 1.6744e+09 0x00007fe850ada2f0: movl 0x4c(%rsi, %rdx, 4), %r11d 0x00007fe850ada2f5: movl %r11d, (%rsp) 0x00007fe850ada2f9: movl 0x48(%rsi, %rdx, 4), %r10d 0x00007fe850ada2fe: movl %r10d, 4(%rsp) 0x00007fe850ada303: movl 0x10(%rsi, %rdx, 4), %r11d 0x00007fe850ada308: movl 0x14(%rsi, %rdx, 4), %r9d 0x00007fe850ada30d: movl 0x44(%rsi, %rdx, 4), %r10d 0x00007fe850ada312: movl %r10d, 8(%rsp) 0x00007fe850ada317: movl 0x18(%rsi, %rdx, 4), %r8d 0x00007fe850ada31c: cmpl %r11d, %eax 0x00007fe850ada31f: cmovll %r11d, %eax 0x00007fe850ada323: cmpl %r9d, %eax 0x00007fe850ada326: cmovll %r9d, %eax 0x00007fe850ada32a: movl 0x20(%rsi, %rdx, 4), %r10d 0x00007fe850ada32f: cmpl %r8d, %eax 0x00007fe850ada332: cmovll %r8d, %eax 0x00007fe850ada336: movl 0x24(%rsi, %rdx, 4), %r8d 0x00007fe850ada33b: movl 0x28(%rsi, %rdx, 4), %r11d ; {no_reloc} 0x00007fe850ada340: movl 0x2c(%rsi, %rdx, 4), %ecx 0x00007fe850ada344: movl 0x30(%rsi, %rdx, 4), %r9d 0x00007fe850ada349: movl 0x34(%rsi, %rdx, 4), %edi 0x00007fe850ada34d: movl 0x38(%rsi, %rdx, 4), %ebx 0x00007fe850ada351: movl 0x3c(%rsi, %rdx, 4), %ebp 0x00007fe850ada355: movl 0x40(%rsi, %rdx, 4), %r13d 0x00007fe850ada35a: movl 0x1c(%rsi, %rdx, 4), %r14d 0x00007fe850ada35f: cmpl %r14d, %eax 0x00007fe850ada362: cmovll %r14d, %eax 0x00007fe850ada366: cmpl %r10d, %eax 0x00007fe850ada369: cmovll %r10d, %eax 0x00007fe850ada36d: cmpl %r8d, %eax 0x00007fe850ada370: cmovll %r8d, %eax 0x00007fe850ada374: cmpl %r11d, %eax 0x00007fe850ada377: cmovll %r11d, %eax 0x00007fe850ada37b: cmpl %ecx, %eax 0x00007fe850ada37d: cmovll %ecx, %eax 0x00007fe850ada380: cmpl %r9d, %eax 0x00007fe850ada383: cmovll %r9d, %eax 0x00007fe850ada387: cmpl %edi, %eax 0x00007fe850ada389: cmovll %edi, %eax 0x00007fe850ada38c: cmpl %ebx, %eax 0x00007fe850ada38e: cmovll %ebx, %eax 0x00007fe850ada391: cmpl %ebp, %eax 0x00007fe850ada393: cmovll %ebp, %eax 0x00007fe850ada396: cmpl %r13d, %eax 0x00007fe850ada399: cmovll %r13d, %eax 0x00007fe850ada39d: cmpl 8(%rsp), %eax 0x00007fe850ada3a1: movl 8(%rsp), %r11d 0x00007fe850ada3a6: cmovll %r11d, %eax 0x00007fe850ada3aa: cmpl 4(%rsp), %eax 0x00007fe850ada3ae: movl 4(%rsp), %r10d 0x00007fe850ada3b3: cmovll %r10d, %eax 0x00007fe850ada3b7: cmpl (%rsp), %eax 0x00007fe850ada3ba: movl (%rsp), %r11d 0x00007fe850ada3be: cmovll %r11d, %eax ;*istore_1 {reexecute=0 rethrow=0 return_oop=0} ; - TestIntMax::test2 at 25 (line 71) There are a couple of things is puzzling me. This test is like a reduction test and no vectorization appears to be kicking in any of the percentages (I've not enabled vectorization SW rejections to check). The other thing that is strange is the overall time. When no vectorization kicks in and the code uses cmovs, I've been seeing worse performance numbers compared to say compare and jumps, particularly in 100% tests. With `TestIntMax` it appears to be the opposite, test2 at 100% uses jpm+cmp, which performs worse than cmov versions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2663665858 From galder at openjdk.org Mon Feb 17 17:21:15 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 17 Feb 2025 17:21:15 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Mon, 17 Feb 2025 17:02:47 GMT, Galder Zamarre?o wrote: > This test is like a reduction test and no vectorization appears to be kicking in any of the percentages (I've not enabled vectorization SW rejections to check). Ah, that's probably because of profitable vectorization checks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2663710153 From fyang at openjdk.org Tue Feb 18 00:22:13 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 18 Feb 2025 00:22:13 GMT Subject: RFR: 8350093: RISC-V: java/math/BigInteger/LargeValueExceptions.java timeout with COH In-Reply-To: References: <20dDUIvvN45lILuiZ1hdaOXnRDTNl1V2nKI4X1S1lPE=.f615a14e-0518-4177-ac47-d9b8fd222d2b@github.com> Message-ID: On Mon, 17 Feb 2025 13:53:56 GMT, Hamlin Li wrote: >> Hi, please review this change resolving a timeout issue in `LargeValueExceptions.squareDefiniteOverflow()`. >> >> This issue only happens on platforms with slow unaligned memory accesses like Unmatched or Premier-P550 SBCs. >> Async profiler shows major time was spent in multiplyToLen stub code. When AvoidUnalignedAccesses is enabled, >> there is a simple alignment check, which assumes 8-byte alignment for base_offset of int arrays. But this is >> not the case with COH: base_offset is 12 bytes instead of 16 bytes for int arrays. >> >> Patch simply makes it explicit about the requirement of base_offset. Sanity tested on Premier P550. >> No obvious change witnessed on JMH after this change: >> >> ----------------------------------------------------------------------------------------------- >> >> Without COH: >> >> Benchmark (maxNumbits) Mode Cnt Score Error Units >> BigIntegers.SmallShifts.testLeftShift 32 avgt 15 138.939 ? 2.246 ns/op >> BigIntegers.SmallShifts.testLeftShift 128 avgt 15 88.391 ? 1.210 ns/op >> BigIntegers.SmallShifts.testLeftShift 256 avgt 15 117.590 ? 1.398 ns/op >> BigIntegers.SmallShifts.testRightShift 32 avgt 15 150.338 ? 1.961 ns/op >> BigIntegers.SmallShifts.testRightShift 128 avgt 15 104.540 ? 5.636 ns/op >> BigIntegers.SmallShifts.testRightShift 256 avgt 15 126.082 ? 1.756 ns/op >> BigIntegers.testAdd N/A avgt 15 97.513 ? 40.746 ns/op >> BigIntegers.testGcd N/A avgt 15 5409222.706 ? 5934.667 ns/op >> BigIntegers.testHugeLargeDivide N/A avgt 15 246.904 ? 1.552 ns/op >> BigIntegers.testHugeSmallDivide N/A avgt 15 248.997 ? 1.374 ns/op >> BigIntegers.testHugeToString N/A avgt 15 2421.432 ? 62.208 ns/op >> BigIntegers.testLargeSmallDivide N/A avgt 15 216.859 ? 1.760 ns/op >> BigIntegers.testLargeToString N/A avgt 15 425.653 ? 13.305 ns/op >> BigIntegers.testLeftShift N/A avgt 15 2265.137 ? 24.319 ns/op >> BigIntegers.testMultiply N/A avgt 15 15862.412 ? 417.880 ns/op <======== >> BigIntegers.testRightShift N/A avgt 15 936.071 ? 15.247 ns/op >> BigIntegers.testSmallTo... > > Marked as reviewed by mli (Reviewer). @Hamlin-Li @feilongjiang : Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23631#issuecomment-2664268644 From fyang at openjdk.org Tue Feb 18 00:22:14 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 18 Feb 2025 00:22:14 GMT Subject: Integrated: 8350093: RISC-V: java/math/BigInteger/LargeValueExceptions.java timeout with COH In-Reply-To: <20dDUIvvN45lILuiZ1hdaOXnRDTNl1V2nKI4X1S1lPE=.f615a14e-0518-4177-ac47-d9b8fd222d2b@github.com> References: <20dDUIvvN45lILuiZ1hdaOXnRDTNl1V2nKI4X1S1lPE=.f615a14e-0518-4177-ac47-d9b8fd222d2b@github.com> Message-ID: <8I5vkr-JVBen9SIExq02r4sM3XrUI3TLGtbZAUnxVj8=.83d425ab-884f-4b9e-9442-2d1494fda455@github.com> On Fri, 14 Feb 2025 13:21:58 GMT, Fei Yang wrote: > Hi, please review this change resolving a timeout issue in `LargeValueExceptions.squareDefiniteOverflow()`. > > This issue only happens on platforms with slow unaligned memory accesses like Unmatched or Premier-P550 SBCs. > Async profiler shows major time was spent in multiplyToLen stub code. When AvoidUnalignedAccesses is enabled, > there is a simple alignment check, which assumes 8-byte alignment for base_offset of int arrays. But this is > not the case with COH: base_offset is 12 bytes instead of 16 bytes for int arrays. > > Patch simply makes it explicit about the requirement of base_offset. Sanity tested on Premier P550. > No obvious change witnessed on JMH after this change: > > ----------------------------------------------------------------------------------------------- > > Without COH: > > Benchmark (maxNumbits) Mode Cnt Score Error Units > BigIntegers.SmallShifts.testLeftShift 32 avgt 15 138.939 ? 2.246 ns/op > BigIntegers.SmallShifts.testLeftShift 128 avgt 15 88.391 ? 1.210 ns/op > BigIntegers.SmallShifts.testLeftShift 256 avgt 15 117.590 ? 1.398 ns/op > BigIntegers.SmallShifts.testRightShift 32 avgt 15 150.338 ? 1.961 ns/op > BigIntegers.SmallShifts.testRightShift 128 avgt 15 104.540 ? 5.636 ns/op > BigIntegers.SmallShifts.testRightShift 256 avgt 15 126.082 ? 1.756 ns/op > BigIntegers.testAdd N/A avgt 15 97.513 ? 40.746 ns/op > BigIntegers.testGcd N/A avgt 15 5409222.706 ? 5934.667 ns/op > BigIntegers.testHugeLargeDivide N/A avgt 15 246.904 ? 1.552 ns/op > BigIntegers.testHugeSmallDivide N/A avgt 15 248.997 ? 1.374 ns/op > BigIntegers.testHugeToString N/A avgt 15 2421.432 ? 62.208 ns/op > BigIntegers.testLargeSmallDivide N/A avgt 15 216.859 ? 1.760 ns/op > BigIntegers.testLargeToString N/A avgt 15 425.653 ? 13.305 ns/op > BigIntegers.testLeftShift N/A avgt 15 2265.137 ? 24.319 ns/op > BigIntegers.testMultiply N/A avgt 15 15862.412 ? 417.880 ns/op <======== > BigIntegers.testRightShift N/A avgt 15 936.071 ? 15.247 ns/op > BigIntegers.testSmallToString N/A avgt 15 322.350 ? 16.075... This pull request has now been integrated. Changeset: 8df80400 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/8df804005ed772936fd77a4c0335a5620f909570 Stats: 17 lines in 1 file changed: 11 ins; 2 del; 4 mod 8350093: RISC-V: java/math/BigInteger/LargeValueExceptions.java timeout with COH Reviewed-by: mli, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/23631 From jwaters at openjdk.org Tue Feb 18 00:38:13 2025 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 18 Feb 2025 00:38:13 GMT Subject: RFR: 8345265: Minor improvements for LTO across all compilers [v2] In-Reply-To: <3peOk4hOWRVX3sn5BHQbRh5ymyP8Sr146H66jDWkePA=.ef3d0788-2bfa-421b-ad92-a1e46fd0feb5@github.com> References: <2y8p-J2SCTANChv8WvrXmYI1UjVxbC7n8tUJzBOMzEE=.7c2b48a5-423e-4138-8671-3037e8963730@github.com> <3peOk4hOWRVX3sn5BHQbRh5ymyP8Sr146H66jDWkePA=.ef3d0788-2bfa-421b-ad92-a1e46fd0feb5@github.com> Message-ID: On Fri, 17 Jan 2025 13:50:11 GMT, Matthias Baesken wrote: >>> > Member >>> >>> Fixing the JVM under LTO is likely to be a heavy undertaking, much more so than just unbreaking compilation and linking of the JVM (Ignoring that the JVM later crashes when the newly compiled JDK is used to build parts of itself), I'm not sure it would be feasible under the current Pull Request >> >> I was able to build the OpenJDK with LTO enabled on Linux and Windows (so the new JVM does not crash in the build). I just had to not enable gtest because this is currently not compiling with LTO enabled. I was able to run a few benchmarks with the LTO enabled JVM , but as far as I remember a couple of HS jtreg tests fail with LTO enabled because they have some expectations that might not (yet) work with LTO. >> >> Regarding gtest, I created >> https://bugs.openjdk.org/browse/JDK-8346987 >> 8346987: [lto] gtest build fails >> Do you think it would be okay to change the build so that the LTO related flags (in case lto is enabled) do not 'go' into the gtest build ? > >> I was able to run a few benchmarks with the LTO enabled JVM , >> but as far as I remember a couple of HS jtreg tests fail with LTO enabled because they have some expectations that might not (yet) work with LTO > > On Linux x86_64 (gcc 11.3 devkit) , when building with lto enabled, the jdk :tier1 jtreg tests all worked nicely in my environment. > The HS :tier1 jtreg tests had 51 failures, 50 in the serviceability/sa area . > Those failures (from serviceability/sa) seem to have in common that they show such an exception > > stderr: [Exception in thread "main" java.lang.InternalError: Metadata does not appear to be polymorphic > at jdk.hotspot.agent/sun.jvm.hotspot.types.basic.BasicTypeDataBase.findDynamicTypeForAddress(BasicTypeDataBase.java:223) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VirtualBaseConstructor.instantiateWrapperFor(VirtualBaseConstructor.java:104) > at jdk.hotspot.agent/sun.jvm.hotspot.oops.Metadata.instantiateWrapperFor(Metadata.java:78) > at jdk.hotspot.agent/sun.jvm.hotspot.oops.MetadataField.getValue(MetadataField.java:43) > at jdk.hotspot.agent/sun.jvm.hotspot.oops.MetadataField.getValue(MetadataField.java:40) > at jdk.hotspot.agent/sun.jvm.hotspot.classfile.ClassLoaderData.getKlasses(ClassLoaderData.java:82) > at jdk.hotspot.agent/sun.jvm.hotspot.classfile.ClassLoaderData.classesDo(ClassLoaderData.java:101) > at jdk.hotspot.agent/sun.jvm.hotspot.classfile.ClassLoaderDataGraph.classesDo(ClassLoaderDataGraph.java:84) > at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor$19.doit(CommandProcessor.java:926) > at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2230) > at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2200) > at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.run(CommandProcessor.java:2071) > at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.run(CLHSDB.java:112) > at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.main(CLHSDB.java:44) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runCLHSDB(SALauncher.java:285) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:507) > > or test serviceability/sa/TestJhsdbJstackMixed.java > > stderr: [java.lang.InternalError: Metadata does not appear to be polymorphic > at jdk.hotspot.agent/sun.jvm.hotspot.types.basic.BasicTypeDataBase.findDynamicTypeForAddress(BasicTypeDataBase.java:223) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VirtualBaseConstructor.instantiateWrapperFor(VirtualBaseConstructor.java:1... @MBaesken Currently with LTO active on gcc 14 commit e648a907b31fd0d6b746d149fda2a8d5fbe26dc0 is causing serious trouble on my end by mass inlining everything, bloating the JVM to nearly 60MB in size, does HotSpot have the same size issues on your end with LTO? (--enable-jvm-feature-opt-size is off the table because the JVM should ideally be an acceptable size even without that flag, and -Os and LTO doesn't work with gcc anyway) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22464#issuecomment-2664283053 From dholmes at openjdk.org Tue Feb 18 01:28:20 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 18 Feb 2025 01:28:20 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Sat, 15 Feb 2025 11:49:53 GMT, Albert Mingkun Yang wrote: > I have removed the new API, and switched to use the original in_critical(). You still need it to be an atomic load together with whatever compiler barriers that implies, otherwise it can be hoisted out of the spin-loop: while (cur->in_critical()) { spin_yield.wait(); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2664351707 From dholmes at openjdk.org Tue Feb 18 01:40:22 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 18 Feb 2025 01:40:22 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v3] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Sat, 15 Feb 2025 11:44:44 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker The GCLocker behaviour would be easier to discern if all of the `thread` parameters/variables that have to be the current thread were actually called `current` (with a few suitably placed assertions). ------------- PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2622203973 From jwaters at openjdk.org Tue Feb 18 02:39:25 2025 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 18 Feb 2025 02:39:25 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v18] In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 06:32:56 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 9. X86 backend implementation for all supported intrinsics. >> 10. Functional and Performance validation tests. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions Is anyone else getting compile failures after this was integrated? This weirdly seems to only happen on Linux * For target hotspot_variant-server_libjvm_objs_mulnode.o: /home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp: In member function ?virtual const Type* FmaHFNode::Value(PhaseGVN*) const?: /home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp:1944:37: error: call of overloaded ?make(double)? is ambiguous 1944 | return TypeH::make(fma(f1, f2, f3)); | ^ In file included from /home/runner/work/jdk/jdk/src/hotspot/share/opto/node.hpp:31, from /home/runner/work/jdk/jdk/src/hotspot/share/opto/addnode.hpp:28, from /home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp:26: /home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:544:23: note: candidate: ?static const TypeH* TypeH::make(float)? 544 | static const TypeH* make(float f); | ^~~~ /home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:545:23: note: candidate: ?static const TypeH* TypeH::make(short int)? 545 | static const TypeH* make(short f); | ^~~~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/22754#issuecomment-2664473623 From asmehra at openjdk.org Tue Feb 18 05:24:10 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 18 Feb 2025 05:24:10 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v2] In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 19:21:32 GMT, Calvin Cheung wrote: >> This changset refactors CDS class paths and module paths validation code into a new class `AOTCodeSource` and related class `AOTCodeSourceConfig`. Code has been moved from filemap.[c|h]pp, classLoader.[c|h]pp, and classLoaderExt.[c|h]pp to aotCodeSource.[c|h]pp. CDS dependencies have been removed from `classLoader.cpp`. More refactoring could be done, such as removing `classLoaderExt.cpp`, in a future RFE. >> >> Passed tiers 1 - 5 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > @iklam and @ashu-mehra comment Marked as reviewed by asmehra (Committer). Just one more comment, rest looks good. src/hotspot/share/runtime/threads.cpp line 27: > 25: */ > 26: > 27: #include "cds/aotCodeSource.hpp" Why is this include needed? ------------- PR Review: https://git.openjdk.org/jdk/pull/23476#pullrequestreview-2622461930 PR Comment: https://git.openjdk.org/jdk/pull/23476#issuecomment-2664635612 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1959077605 From dholmes at openjdk.org Tue Feb 18 07:28:19 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 18 Feb 2025 07:28:19 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v2] In-Reply-To: References: Message-ID: <_9GFraN7--YUC8esAB28iHzdRC7eYJok355TpDH7Df8=.30548f08-454f-47ba-83d5-a4feabaee9ff@github.com> On Fri, 14 Feb 2025 19:21:32 GMT, Calvin Cheung wrote: >> This changset refactors CDS class paths and module paths validation code into a new class `AOTCodeSource` and related class `AOTCodeSourceConfig`. Code has been moved from filemap.[c|h]pp, classLoader.[c|h]pp, and classLoaderExt.[c|h]pp to aotCodeSource.[c|h]pp. CDS dependencies have been removed from `classLoader.cpp`. More refactoring could be done, such as removing `classLoaderExt.cpp`, in a future RFE. >> >> Passed tiers 1 - 5 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > @iklam and @ashu-mehra comment This is a very large change to try and digest. Is it really just a refactor? I've made a few comments in places where it seems functionality may have been changed as well. src/hotspot/share/cds/aotCodeSource.cpp line 2: > 1: /* > 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. If the code has been moved from other files, the current opinion/consensus is that the first copyright year should be the oldest first year from all the files from which the code was obtained. src/hotspot/share/cds/aotCodeSource.cpp line 106: > 104: for (const char* bootcp = Arguments::get_boot_class_path(); *bootcp != '\0'; ++bootcp) { > 105: if (*bootcp == *os::path_separator()) { > 106: ++ bootcp; Nit (possibly pre-existing) - no space before/after unary operators src/hotspot/share/cds/aotCodeSource.hpp line 125: > 123: // during AOTCache creation are the same as when the AOTCache is used during runtime. > 124: // Non-existent entries are recorded during AOTCache creation. Those non-existent entries > 125: // must not exist during runtime. Does this mean that if Foo.jar is on the classpath but does not in fact exist, then we record it was on the classpath and require it to be on the classpath at runtime, but also to still not exist? src/hotspot/share/cds/aotCodeSource.hpp line 128: > 126: // > 127: // Some details on validation: > 128: // - the boot classpath could be appended during runtime if there's no app classpath and Suggestion: // - the boot classpath can be appended to at runtime if there's no app classpath and no src/hotspot/share/cds/aotCodeSource.hpp line 130: > 128: // - the boot classpath could be appended during runtime if there's no app classpath and > 129: // module path specified when an AOTCache is created; > 130: // - the app classpath could be appended during runtime; Suggestion: // - the app classpath can be appended to at runtime; src/hotspot/share/cds/aotCodeSource.hpp line 131: > 129: // module path specified when an AOTCache is created; > 130: // - the app classpath could be appended during runtime; > 131: // - the module path during runtime could be a superset of the one specified during AOTCache creation. Suggestion: // - the module path at runtime can be a superset of the one specified during AOTCache creation. src/hotspot/share/runtime/threads.cpp line 809: > 807: vm_exit_during_initialization("ClassLoader::initialize_module_path() failed unexpectedly"); > 808: } > 809: #endif Not obvious where this functionality is now handled. test/hotspot/jtreg/runtime/cds/appcds/BootClassPathMismatch.java line 243: > 241: * No error - bootclasspath can be appended during runtime if no -cp is specified. > 242: */ > 243: public void testBootClassPathAppend() throws Exception { A refactoring should not be introducing new test cases. Did you refactor and enhance? test/hotspot/jtreg/runtime/cds/appcds/NonExistClasspath.java line 70: > 68: .assertNormalExit(); > 69: > 70: // Replace nonExistPath with another non-existent file in the CP, it should still work Not at all clearr why these test cases have been removed? ------------- PR Review: https://git.openjdk.org/jdk/pull/23476#pullrequestreview-2622538668 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1959178211 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1959126336 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1959181514 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1959181842 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1959183865 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1959184959 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1959192451 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1959193331 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1959194900 From stefank at openjdk.org Tue Feb 18 07:54:22 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 18 Feb 2025 07:54:22 GMT Subject: RFR: 8349652: Rewire nmethod oop load barriers In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 09:57:15 GMT, Stefan Karlsson wrote: > When loading oops from nmethods we current use the Access API to inject load barriers for the GCs that requires them. As part of the ZGC load barrier we need access to the nmethod to properly perform the load barrier. The current implementation of the Access API doesn't support passing down the nmethod through all its layers of code so ZGC asks the code cache what nmethod the various oops belongs to. There's currently an open PR for JDK-8343789 (#21276), which moves the oops out of the code cache, so the current way ZGC implementation will not work after that has been integrated. > > The proposal is to figure out a way to explicitly pass down the nmethod to the load barriers. > > We could extend the Access API to pass down the nmethod through all its various layers. The drawback of that is that it adds a lot of boiler plate code and requires new over loads and/or names. Given that this isn't performance critical code I propose that we take the much simpler route and call straight to the BarrierSetNMethod class. > > Given that MMethodAccess and IN_NMETHOD were only introduced to support nmethod oop loads for ZGC and are note used anymore I've also removed them from the code. > > Tested with reproducer for the ZGC issue in JDK-8343789, tier1-7 Linux with ZGC tasks, currently running tier1-3. Thanks for the reviews! I reran the tests in tier1-tier3. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23512#issuecomment-2664853150 From stefank at openjdk.org Tue Feb 18 07:54:22 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 18 Feb 2025 07:54:22 GMT Subject: Integrated: 8349652: Rewire nmethod oop load barriers In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 09:57:15 GMT, Stefan Karlsson wrote: > When loading oops from nmethods we current use the Access API to inject load barriers for the GCs that requires them. As part of the ZGC load barrier we need access to the nmethod to properly perform the load barrier. The current implementation of the Access API doesn't support passing down the nmethod through all its layers of code so ZGC asks the code cache what nmethod the various oops belongs to. There's currently an open PR for JDK-8343789 (#21276), which moves the oops out of the code cache, so the current way ZGC implementation will not work after that has been integrated. > > The proposal is to figure out a way to explicitly pass down the nmethod to the load barriers. > > We could extend the Access API to pass down the nmethod through all its various layers. The drawback of that is that it adds a lot of boiler plate code and requires new over loads and/or names. Given that this isn't performance critical code I propose that we take the much simpler route and call straight to the BarrierSetNMethod class. > > Given that MMethodAccess and IN_NMETHOD were only introduced to support nmethod oop loads for ZGC and are note used anymore I've also removed them from the code. > > Tested with reproducer for the ZGC issue in JDK-8343789, tier1-7 Linux with ZGC tasks, currently running tier1-3. This pull request has now been integrated. Changeset: 3353f8e0 Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/3353f8e0875165adbc8ee764a4c8d8817a87cd88 Stats: 76 lines in 10 files changed: 39 ins; 14 del; 23 mod 8349652: Rewire nmethod oop load barriers Reviewed-by: kvn, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/23512 From galder at openjdk.org Tue Feb 18 08:04:21 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 18 Feb 2025 08:04:21 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 7 Feb 2025 12:39:24 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 44 additional commits since the last revision: > > - Merge branch 'master' into topic.intrinsify-max-min-long > - Fix typo > - Renaming methods and variables and add docu on algorithms > - Fix copyright years > - Make sure it runs with cpus with either avx512 or asimd > - Test can only run with 256 bit registers or bigger > > * Remove platform dependant check > and use platform independent configuration instead. > - Fix license header > - Tests should also run on aarch64 asimd=true envs > - Added comment around the assertions > - Adjust min/max identity IR test expectations after changes > - ... and 34 more: https://git.openjdk.org/jdk/compare/6ad0c61a...a190ae68 What is happening with int min/max needs a separate investigation because based on my testing, the int min/max intrinsic is both a regression and a performance improvement! Check this out: make test TEST="micro:org.openjdk.bench.java.lang.MinMaxVector.intReductionSimpleMax" MICRO="FORK=1" Benchmark (probability) (size) Mode Cnt Score Error Units MinMaxVector.intReductionSimpleMax 50 2048 thrpt 4 460.585 ? 0.348 ops/ms MinMaxVector.intReductionSimpleMax 80 2048 thrpt 4 460.633 ? 0.103 ops/ms MinMaxVector.intReductionSimpleMax 100 2048 thrpt 4 460.580 ? 0.091 ops/ms make test TEST="micro:org.openjdk.bench.java.lang.MinMaxVector.intReductionSimpleMax" MICRO="FORK=1;OPTIONS=-jvmArgs -XX:CompileCommand=option,org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_intReductionSimpleMax_jmhTest::intReductionSimpleMax_thrpt_jmhStub,ccstrlist,DisableIntrinsic,_max" Benchmark (probability) (size) Mode Cnt Score Error Units MinMaxVector.intReductionSimpleMax 50 2048 thrpt 4 460.479 ? 0.044 ops/ms MinMaxVector.intReductionSimpleMax 80 2048 thrpt 4 460.587 ? 0.106 ops/ms MinMaxVector.intReductionSimpleMax 100 2048 thrpt 4 1027.831 ? 9.353 ops/ms 80%: ?? ? 0x00007ffb200fa089: cmpl %r11d, %r10d 3.04% ?? ? 0x00007ffb200fa08c: cmovll %r11d, %r10d 4.38% ?? ? 0x00007ffb200fa090: cmpl %ebx, %r10d 1.61% ?? ? 0x00007ffb200fa093: cmovll %ebx, %r10d 2.79% ?? ? 0x00007ffb200fa097: cmpl %edi, %r10d 2.92% ?? ? 0x00007ffb200fa09a: cmovll %edi, %r10d ;*ireturn {reexecute=0 rethrow=0 return_oop=0} ?? ? ; - java.lang.Math::max at 10 (line 2023) ?? ? ; - org.openjdk.bench.java.lang.MinMaxVector::intReductionSimpleMax at 23 (line 232) 100%: 3.11% ??????? ?????? ? 0x00007f26c00f8f9c: nopl (%rax) 3.31% ??????? ?????? ? 0x00007f26c00f8fa0: cmpl %r10d, %ecx ???????? ?????? ? 0x00007f26c00f8fa3: jge 0x7f26c00f8ff1 ;*ireturn {reexecute=0 rethrow=0 return_oop=0} ???????? ?????? ? ; - java.lang.Math::max at 10 (line 2023) ???????? ?????? ? ; - org.openjdk.bench.java.lang.MinMaxVector::intReductionSimpleMax at 23 (line 232) ???????? ?????? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_intReductionSimpleMax_jmhTest::intReductionSimpleMax_thrpt_jmhStub at 19 (line 124) make test TEST="micro:org.openjdk.bench.java.lang.MinMaxVector.intReductionMultiplyMax" MICRO="FORK=1" Benchmark (probability) (size) Mode Cnt Score Error Units MinMaxVector.intReductionMultiplyMax 50 2048 thrpt 4 2815.614 ? 0.406 ops/ms MinMaxVector.intReductionMultiplyMax 80 2048 thrpt 4 2814.943 ? 2.174 ops/ms MinMaxVector.intReductionMultiplyMax 100 2048 thrpt 4 2815.285 ? 1.725 ops/ms make test TEST="micro:org.openjdk.bench.java.lang.MinMaxVector.intReductionMultiplyMax" MICRO="FORK=1;OPTIONS=-jvmArgs -XX:CompileCommand=option,org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_intReductionMultiplyMax_jmhTest::intReductionMultiplyMax_thrpt_jmhStub,ccstrlist,DisableIntrinsic,_max" Benchmark (probability) (size) Mode Cnt Score Error Units MinMaxVector.intReductionMultiplyMax 50 2048 thrpt 4 2802.062 ? 0.710 ops/ms MinMaxVector.intReductionMultiplyMax 80 2048 thrpt 4 2814.874 ? 4.058 ops/ms MinMaxVector.intReductionMultiplyMax 100 2048 thrpt 4 883.879 ? 0.327 ops/ms 80%: 3.54% ? ?? ????? 0x00007faa700fa177: vpmaxsd %ymm4, %ymm5, %ymm13;*ireturn {reexecute=0 rethrow=0 return_oop=0} ? ?? ????? ; - java.lang.Math::max at 10 (line 2023) 100: 7.50% ??????????????????? ? 0x00007f75280f8849: imull $0xb, 0x2c(%rbp, %r11, 4), %r10d ??????????????????? ? ;*imul {reexecute=0 rethrow=0 return_oop=0} ??????????????????? ? ; - org.openjdk.bench.java.lang.MinMaxVector::intReductionMultiplyMax at 20 (line 221) ??????????????????? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_intReductionMultiplyMax_jmhTest::intReductionMultiplyMax_thrpt_jmhStub at 19 (line 124) 3.85% ??????????????????? ? 0x00007f75280f884f: cmpl %r10d, %r8d ??????????????????? ? 0x00007f75280f8852: jl 0x7f75280f87d0 ;*if_icmplt {reexecute=0 rethrow=0 return_oop=0} ?????????? ???????? ? ; - java.lang.Math::max at 2 (line 2023) ?????????? ???????? ? ; - org.openjdk.bench.java.lang.MinMaxVector::intReductionMultiplyMax at 26 (line 222) ?????????? ???????? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_intReductionMultiplyMax_jmhTest::intReductionMultiplyMax_thrpt_jmhStub at 19 (line 124) I ran the exact same test with longs and I don't see such an issue. The performance is always the same either with the intrisinc or disabling it as shown above. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2664871838 From galder at openjdk.org Tue Feb 18 08:16:19 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 18 Feb 2025 08:16:19 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: <3ArmrOQcUoj8DhHTq1a40Oz3GE8bCDDy3FFeVgbladg=.b8e0e13b-39f3-41a6-8a1b-5ca4febb4a41@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> <5oGMaD5b87inAMkco6l5ODRvWv7FRsHGJiu_UMrGrTc=.0be44429-d322-4a6f-b91d-b64a146fad05@github.com> <3ArmrOQcUoj8DhHTq1a40Oz3GE8bCDDy3FF eVgbladg=.b8e0e13b-39f3-41a6-8a1b-5ca4febb4a41@github.com> Message-ID: On Mon, 17 Feb 2025 15:02:32 GMT, Roland Westrelin wrote: >> @rwestrel @galderz >> >>> It seems overall, we likely win more than we loose with this intrinsic, so I would integrate this change as it is and file a bug to keep track of remaining issues. >> >> I'm a little scared to just accept the regressions, especially for this "most average looking case": >> Imagine you have an array with random numbers. Or at least numbers in a random order. If we take the max, then we expect the first number to be max with probability 1, the second 1/2, the third 1/3, the i'th 1/i. So the average branch probability is `n / (sum_i 1/i)`. This goes closer and closer to zero, the larger the array. This means that the "average" case has an extreme probability. And so if we do not vectorize, then this gets us a regression with the current patch. And vectorization is a little fragile, it only takes very little for vectorization not to kick in. >> >>> The Min/Max nodes are floating nodes. They can hoist out of loop and common reliably in ways that are not guaranteed otherwise. >> >> I suppose we could write an optimization that can hoist out loop independent if-diamonds out of a loop. If the condition and all phi inputs are loop invariant, you could just cut the diamond out of the loop, and paste it before the loop entry. >> >>> Shouldn't int min/max be affected the same way? >> >> I think we should be able to see the same issue here, actually. Yes. Here a quick benchmark below: >> >> >> java -XX:CompileCommand=compileonly,TestIntMax::test* -XX:CompileCommand=printcompilation,TestIntMax::test* -XX:+TraceNewVectors TestIntMax.java >> CompileCommand: compileonly TestIntMax.test* bool compileonly = true >> CompileCommand: PrintCompilation TestIntMax.test* bool PrintCompilation = true >> Warmup >> 5225 93 % 3 TestIntMax::test1 @ 5 (27 bytes) >> 5226 94 3 TestIntMax::test1 (27 bytes) >> 5226 95 % 4 TestIntMax::test1 @ 5 (27 bytes) >> 5238 96 4 TestIntMax::test1 (27 bytes) >> Run >> Time: 542056319 >> Warmup >> 6320 101 % 3 TestIntMax::test2 @ 5 (34 bytes) >> 6322 102 % 4 TestIntMax::test2 @ 5 (34 bytes) >> 6329 103 4 TestIntMax::test2 (34 bytes) >> Run >> Time: 166815209 >> >> That's a 4x regression on random input data! >> >> With: >> >> import java.util.Random; >> >> public class TestIntMax { >> private static Random RANDOM = new Random(); >> >> public static void main(String[] args) { >> int[] a = new int[64 * 1024]; >> for (int i = 0; i < a.length; i++) { >>... > >> I think we should be able to see the same issue here, actually. Yes. Here a quick benchmark below: > > I observe the same: > > > Warmup > 751 3 b TestIntMax::test1 (27 bytes) > Run > Time: 360 550 158 > Warmup > 1862 15 b TestIntMax::test2 (34 bytes) > Run > Time: 92 116 170 > > > But then with this: > > > diff --git a/src/hotspot/cpu/x86/x86_64.ad b/src/hotspot/cpu/x86/x86_64.ad > index 8cc4a970bfd..9abda8f4178 100644 > --- a/src/hotspot/cpu/x86/x86_64.ad > +++ b/src/hotspot/cpu/x86/x86_64.ad > @@ -12037,16 +12037,20 @@ instruct cmovI_reg_l(rRegI dst, rRegI src, rFlagsReg cr) > %} > > > -instruct maxI_rReg(rRegI dst, rRegI src) > +instruct maxI_rReg(rRegI dst, rRegI src, rFlagsReg cr) > %{ > match(Set dst (MaxI dst src)); > + effect(KILL cr); > > ins_cost(200); > - expand %{ > - rFlagsReg cr; > - compI_rReg(cr, dst, src); > - cmovI_reg_l(dst, src, cr); > + ins_encode %{ > + Label done; > + __ cmpl($src$$Register, $dst$$Register); > + __ jccb(Assembler::less, done); > + __ mov($dst$$Register, $src$$Register); > + __ bind(done); > %} > + ins_pipe(pipe_cmov_reg); > %} > > // ============================================================================ > > > the performance gap narrows: > > > Warmup > 770 3 b TestIntMax::test1 (27 bytes) > Run > Time: 94 951 677 > Warmup > 1312 15 b TestIntMax::test2 (34 bytes) > Run > Time: 70 053 824 > > > (the number of test2 fluctuates quite a bit). Does it ever make sense to implement `MaxI` with a conditional move then? Note something I spoke with @rwestrel yesterday in the context of long min/max vs int min/max. Int has an ad implementation for min/max whereas long as not. My very first prototype of this issue was to mimmic what int did with log, but talking to @rwestrel we decided it would be better to implement this without introducing platform specific changes. So, following Roland's thread in https://github.com/openjdk/jdk/pull/20098#issuecomment-2663379660, I could add ad changes for say x86 and aarch64 for long such that it uses branch instead of cmov. Note that the cmov fallback of long min/max comes from macro expansion, not platform specific changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2664893516 From galder at openjdk.org Tue Feb 18 08:20:15 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 18 Feb 2025 08:20:15 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: <3ArmrOQcUoj8DhHTq1a40Oz3GE8bCDDy3FFeVgbladg=.b8e0e13b-39f3-41a6-8a1b-5ca4febb4a41@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> <5oGMaD5b87inAMkco6l5ODRvWv7FRsHGJiu_UMrGrTc=.0be44429-d322-4a6f-b91d-b64a146fad05@github.com> <3ArmrOQcUoj8DhHTq1a40Oz3GE8bCDDy3FF eVgbladg=.b8e0e13b-39f3-41a6-8a1b-5ca4febb4a41@github.com> Message-ID: On Mon, 17 Feb 2025 15:02:32 GMT, Roland Westrelin wrote: >> @rwestrel @galderz >> >>> It seems overall, we likely win more than we loose with this intrinsic, so I would integrate this change as it is and file a bug to keep track of remaining issues. >> >> I'm a little scared to just accept the regressions, especially for this "most average looking case": >> Imagine you have an array with random numbers. Or at least numbers in a random order. If we take the max, then we expect the first number to be max with probability 1, the second 1/2, the third 1/3, the i'th 1/i. So the average branch probability is `n / (sum_i 1/i)`. This goes closer and closer to zero, the larger the array. This means that the "average" case has an extreme probability. And so if we do not vectorize, then this gets us a regression with the current patch. And vectorization is a little fragile, it only takes very little for vectorization not to kick in. >> >>> The Min/Max nodes are floating nodes. They can hoist out of loop and common reliably in ways that are not guaranteed otherwise. >> >> I suppose we could write an optimization that can hoist out loop independent if-diamonds out of a loop. If the condition and all phi inputs are loop invariant, you could just cut the diamond out of the loop, and paste it before the loop entry. >> >>> Shouldn't int min/max be affected the same way? >> >> I think we should be able to see the same issue here, actually. Yes. Here a quick benchmark below: >> >> >> java -XX:CompileCommand=compileonly,TestIntMax::test* -XX:CompileCommand=printcompilation,TestIntMax::test* -XX:+TraceNewVectors TestIntMax.java >> CompileCommand: compileonly TestIntMax.test* bool compileonly = true >> CompileCommand: PrintCompilation TestIntMax.test* bool PrintCompilation = true >> Warmup >> 5225 93 % 3 TestIntMax::test1 @ 5 (27 bytes) >> 5226 94 3 TestIntMax::test1 (27 bytes) >> 5226 95 % 4 TestIntMax::test1 @ 5 (27 bytes) >> 5238 96 4 TestIntMax::test1 (27 bytes) >> Run >> Time: 542056319 >> Warmup >> 6320 101 % 3 TestIntMax::test2 @ 5 (34 bytes) >> 6322 102 % 4 TestIntMax::test2 @ 5 (34 bytes) >> 6329 103 4 TestIntMax::test2 (34 bytes) >> Run >> Time: 166815209 >> >> That's a 4x regression on random input data! >> >> With: >> >> import java.util.Random; >> >> public class TestIntMax { >> private static Random RANDOM = new Random(); >> >> public static void main(String[] args) { >> int[] a = new int[64 * 1024]; >> for (int i = 0; i < a.length; i++) { >>... > >> I think we should be able to see the same issue here, actually. Yes. Here a quick benchmark below: > > I observe the same: > > > Warmup > 751 3 b TestIntMax::test1 (27 bytes) > Run > Time: 360 550 158 > Warmup > 1862 15 b TestIntMax::test2 (34 bytes) > Run > Time: 92 116 170 > > > But then with this: > > > diff --git a/src/hotspot/cpu/x86/x86_64.ad b/src/hotspot/cpu/x86/x86_64.ad > index 8cc4a970bfd..9abda8f4178 100644 > --- a/src/hotspot/cpu/x86/x86_64.ad > +++ b/src/hotspot/cpu/x86/x86_64.ad > @@ -12037,16 +12037,20 @@ instruct cmovI_reg_l(rRegI dst, rRegI src, rFlagsReg cr) > %} > > > -instruct maxI_rReg(rRegI dst, rRegI src) > +instruct maxI_rReg(rRegI dst, rRegI src, rFlagsReg cr) > %{ > match(Set dst (MaxI dst src)); > + effect(KILL cr); > > ins_cost(200); > - expand %{ > - rFlagsReg cr; > - compI_rReg(cr, dst, src); > - cmovI_reg_l(dst, src, cr); > + ins_encode %{ > + Label done; > + __ cmpl($src$$Register, $dst$$Register); > + __ jccb(Assembler::less, done); > + __ mov($dst$$Register, $src$$Register); > + __ bind(done); > %} > + ins_pipe(pipe_cmov_reg); > %} > > // ============================================================================ > > > the performance gap narrows: > > > Warmup > 770 3 b TestIntMax::test1 (27 bytes) > Run > Time: 94 951 677 > Warmup > 1312 15 b TestIntMax::test2 (34 bytes) > Run > Time: 70 053 824 > > > (the number of test2 fluctuates quite a bit). Does it ever make sense to implement `MaxI` with a conditional move then? To make it more explicit: implementing long min/max in ad files as cmp will likely remove all the 100% regressions that are observed here. I'm going to repeat the same MinMaxVector int min/max reduction test above with the ad changes @rwestrel suggested to see what effect they have. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2664903731 From epeter at openjdk.org Tue Feb 18 08:46:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Feb 2025 08:46:17 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> <5oGMaD5b87inAMkco6l5ODRvWv7FRsHGJiu_UMrGrTc=.0be44429-d322-4a6f-b91d-b64a146fad05@github.com> <3ArmrOQcUoj8DhHTq1a40Oz3GE8bCDDy3FF eVgbladg=.b8e0e13b-39f3-41a6-8a1b-5ca4febb4a41@github.com> Message-ID: On Tue, 18 Feb 2025 08:17:59 GMT, Galder Zamarre?o wrote: >>> I think we should be able to see the same issue here, actually. Yes. Here a quick benchmark below: >> >> I observe the same: >> >> >> Warmup >> 751 3 b TestIntMax::test1 (27 bytes) >> Run >> Time: 360 550 158 >> Warmup >> 1862 15 b TestIntMax::test2 (34 bytes) >> Run >> Time: 92 116 170 >> >> >> But then with this: >> >> >> diff --git a/src/hotspot/cpu/x86/x86_64.ad b/src/hotspot/cpu/x86/x86_64.ad >> index 8cc4a970bfd..9abda8f4178 100644 >> --- a/src/hotspot/cpu/x86/x86_64.ad >> +++ b/src/hotspot/cpu/x86/x86_64.ad >> @@ -12037,16 +12037,20 @@ instruct cmovI_reg_l(rRegI dst, rRegI src, rFlagsReg cr) >> %} >> >> >> -instruct maxI_rReg(rRegI dst, rRegI src) >> +instruct maxI_rReg(rRegI dst, rRegI src, rFlagsReg cr) >> %{ >> match(Set dst (MaxI dst src)); >> + effect(KILL cr); >> >> ins_cost(200); >> - expand %{ >> - rFlagsReg cr; >> - compI_rReg(cr, dst, src); >> - cmovI_reg_l(dst, src, cr); >> + ins_encode %{ >> + Label done; >> + __ cmpl($src$$Register, $dst$$Register); >> + __ jccb(Assembler::less, done); >> + __ mov($dst$$Register, $src$$Register); >> + __ bind(done); >> %} >> + ins_pipe(pipe_cmov_reg); >> %} >> >> // ============================================================================ >> >> >> the performance gap narrows: >> >> >> Warmup >> 770 3 b TestIntMax::test1 (27 bytes) >> Run >> Time: 94 951 677 >> Warmup >> 1312 15 b TestIntMax::test2 (34 bytes) >> Run >> Time: 70 053 824 >> >> >> (the number of test2 fluctuates quite a bit). Does it ever make sense to implement `MaxI` with a conditional move then? > > To make it more explicit: implementing long min/max in ad files as cmp will likely remove all the 100% regressions that are observed here. I'm going to repeat the same MinMaxVector int min/max reduction test above with the ad changes @rwestrel suggested to see what effect they have. @galderz I think we will have the same issue with both `int` and `long`: As far as I know, it is really a difficult problem to decide at compile-time if a `cmove` or `branch` is the better choice. I'm not sure there is any heuristic for which you will not find a micro-benchmark where the heuristic made the wrong choice. To my understanding, these are the factors that impact the performance: - `cmove` requires all inputs to complete before it can execute, and it has an inherent latency of a cycle or so itself. But you cannot have any branch mispredictions, and hence no branch misprediction penalties (i.e. when the CPU has to flush out the ops from the wrong branch and restart at the branch). - `branch` can hide some latencies, because we can already continue with the branch that is speculated on. We do not need to wait for the inputs of the comparison to arrive, and we can already continue with the speculated resulting value. But if the speculation is ever wrong, we have to pay the misprediction penalty. In my understanding, there are roughly 3 scenarios: - The branch probability is so extreme that the branch predictor would be correct almost always, and so it is profitable to do branching code. - The branching probability is somewhere in the middle, and the branch is not predictable. Branch mispredictions are very expensive, and so it is better to use `cmove`. - The branching probability is somewhere in the middle, but the branch is predictable (e.g. swapps back and forth). The branch predictor will have almost no mispredictions, and it is faster to use branching code. Modeling this precisely is actually a little complex. You would have to know the cost of the `cmove` and the `branching` version of the code. That depends on the latency of the inputs, and the outputs: does the `cmove` dramatically increase the latency on the critical path, and `branching` could hide some of that latency? And you would have to know how good the branch predictor is, which you cannot derive from the branching probability of our profiling (at least not when the probabilities are in the middle, and you don't know if it is a random or predictable pattern). If we can find a perfect heuristic - that would be fantastic ;) If we cannot find a perfect heuristic, then we should think about what are the most "common" or "relevant" scenarios, I think. But let's discuss all of this in a call / offline :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2664956307 From ayang at openjdk.org Tue Feb 18 09:20:57 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 18 Feb 2025 09:20:57 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v4] In-Reply-To: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: > Here is an attempt to simplify GCLocker implementation for Serial and Parallel. > > GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. > > The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. > > Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into gclocker - review - Merge branch 'master' into gclocker - review - Merge branch 'master' into gclocker - review - Merge branch 'master' into gclocker - gclocker ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23367/files - new: https://git.openjdk.org/jdk/pull/23367/files/005087e3..78f91d4f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23367&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23367&range=02-03 Stats: 1461 lines in 94 files changed: 848 ins; 266 del; 347 mod Patch: https://git.openjdk.org/jdk/pull/23367.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23367/head:pull/23367 PR: https://git.openjdk.org/jdk/pull/23367 From ayang at openjdk.org Tue Feb 18 09:24:15 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 18 Feb 2025 09:24:15 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 18 Feb 2025 01:25:12 GMT, David Holmes wrote: > You still need it to be an atomic load Then, maybe the logic is easier to read if the "atomic" access is visible directly from that context, instead of hiding it inside `in_critical`. Therefore, it probably makes more sense to introduce a new API. WDYT? > The GCLocker behaviour would be easier to discern ... Renamed to `current_thread` in `enter`,`exit`, and `enter_slow`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2665044825 From galder at openjdk.org Tue Feb 18 09:24:18 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 18 Feb 2025 09:24:18 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> <5oGMaD5b87inAMkco6l5ODRvWv7FRsHGJiu_UMrGrTc=.0be44429-d322-4a6f-b91d-b64a146fad05@github.com> <3ArmrOQcUoj8DhHTq1a40Oz3GE8bCDDy3FF eVgbladg=.b8e0e13b-39f3-41a6-8a1b-5ca4febb4a41@github.com> Message-ID: On Tue, 18 Feb 2025 08:43:38 GMT, Emanuel Peter wrote: > But let's discuss all of this in a call / offline :) Yup. > I ran the exact same test with longs and I don't see such an issue. The performance is always the same either with the intrisinc or disabling it as shown above. For the equivalent long tests I think I made a mistake in the id of the disabled intrinsic, it should be `_maxL` and not `_max`. I will repeat the tests and post if any similar differences observed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2665045881 From roland at openjdk.org Tue Feb 18 09:35:16 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 18 Feb 2025 09:35:16 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 14:40:09 GMT, Emanuel Peter wrote: > Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below. > > **Background** > > With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer. > > **Problem** > > So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code. > > > MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1); > MemorySegment nativeUnaligned = nativeAligned.asSlice(1); > test3(nativeUnaligned); > > > When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not! > > static void test3(MemorySegment ms) { > for (int i = 0; i < RANGE; i++) { > long adr = i * 4L; > int v = ms.get(ELEMENT_LAYOUT, adr); > ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1)); > } > } > > > **Solution: Runtime Checks - Predicate and Multiversioning** > > Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check. > > I came up with 2 options where to place the runtime checks: > - A new "auto vectorization" Parse Predicate: > - This only works when predicates are available. > - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop. > - Multiversion the loop: > - Create 2 copies of the loop (fast and slow loops). > - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take > - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even unaligned `base`s would end up with reasonably fast code. > - We "stall" the `... Would it make sense to add verification code that makes sure that whenever a loop is flagged as multi version, c2 can find the multi version guard (and maybe whenever there's a multi version guard, loops that are guarded are indeed flagged correctly)? src/hotspot/share/opto/loopTransform.cpp line 751: > 749: // Peeling also destroys the connection of the main loop > 750: // to the multiversion_if. > 751: cl->set_no_multiversion(); Would we want to change the multiversion guard at this point so it constant folds and the slow version is removed? src/hotspot/share/opto/loopUnswitch.cpp line 513: > 511: > 512: // Create new Region. > 513: RegionNode* region = new RegionNode(1); So we create a new `Region` every time a new condition is added? src/hotspot/share/opto/loopnode.cpp line 1097: > 1095: // PhaseIdealLoop::add_parse_predicate only checks trap limits per method, so > 1096: // we do a custom check here. > 1097: if (!C->too_many_traps(cloned_sfpt->jvms()->method(), cloned_sfpt->jvms()->bci(), Deoptimization::Reason_auto_vectorization_check)) { Isn't that done by `add_parse_predicate`? src/hotspot/share/opto/traceAutoVectorizationTag.hpp line 32: > 30: > 31: #define COMPILER_TRACE_AUTO_VECTORIZATION_TAG(flags) \ > 32: flags(POINTER_PARSING, "Trace VPointer/MemPointer parsing") \ Has anything changed here? I stared at it a few times and couldn't figure out what has. ------------- PR Review: https://git.openjdk.org/jdk/pull/22016#pullrequestreview-2622881581 PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959338954 PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959344256 PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959347164 PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959349092 From epeter at openjdk.org Tue Feb 18 09:48:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Feb 2025 09:48:16 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: <47tXBG3sQGZVEE5Ya2wr46CopmDjy8OClbpqagIsjgA=.6d07b495-4777-4c7e-a3b7-820f100ec2c0@github.com> On Tue, 18 Feb 2025 09:09:15 GMT, Roland Westrelin wrote: >> Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below. >> >> **Background** >> >> With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer. >> >> **Problem** >> >> So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code. >> >> >> MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1); >> MemorySegment nativeUnaligned = nativeAligned.asSlice(1); >> test3(nativeUnaligned); >> >> >> When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not! >> >> static void test3(MemorySegment ms) { >> for (int i = 0; i < RANGE; i++) { >> long adr = i * 4L; >> int v = ms.get(ELEMENT_LAYOUT, adr); >> ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1)); >> } >> } >> >> >> **Solution: Runtime Checks - Predicate and Multiversioning** >> >> Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check. >> >> I came up with 2 options where to place the runtime checks: >> - A new "auto vectorization" Parse Predicate: >> - This only works when predicates are available. >> - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop. >> - Multiversion the loop: >> - Create 2 copies of the loop (fast and slow loops). >> - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take >> - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even ... > > src/hotspot/share/opto/loopTransform.cpp line 751: > >> 749: // Peeling also destroys the connection of the main loop >> 750: // to the multiversion_if. >> 751: cl->set_no_multiversion(); > > Would we want to change the multiversion guard at this point so it constant folds and the slow version is removed? I suppose we can probably do that. Otherwise, we just have to wait until the `OpaqueMultiversioningNode` constant folds after loop-opts. > src/hotspot/share/opto/loopUnswitch.cpp line 513: > >> 511: >> 512: // Create new Region. >> 513: RegionNode* region = new RegionNode(1); > > So we create a new `Region` every time a new condition is added? Yes. Are you ok with that? Or would you prefer if we extended an existing region (is that possible?) and then we'd have 2 cases, one where there is none yet, and one where we'd extend. I think adding one each time is easier, and it would get commoned anyway, right? > src/hotspot/share/opto/traceAutoVectorizationTag.hpp line 32: > >> 30: >> 31: #define COMPILER_TRACE_AUTO_VECTORIZATION_TAG(flags) \ >> 32: flags(POINTER_PARSING, "Trace VPointer/MemPointer parsing") \ > > Has anything changed here? I stared at it a few times and couldn't figure out what has. I added the tag `SPECULATIVE_RUNTIME_CHECKS`. And then had to change alignment for all others ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959397988 PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959392450 PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959394676 From roland at openjdk.org Tue Feb 18 09:48:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 18 Feb 2025 09:48:14 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Mon, 17 Feb 2025 15:24:44 GMT, Emanuel Peter wrote: > > Do you intend to use a single deoptimization reason for all vectorization related predicates? (that is when you take care of aliasing, are you going to to use the same reason for aliasing and alignment checks) > > I suppose that is currently what I'm planning. But we could in principle separate them. But I would leave that for later, if there is any desire to do that. For now, I think it's ok to just go with a single "auto-vectorization" reason. > > Does that sound reasonable? Yes, it sounds reasonable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2665104472 From epeter at openjdk.org Tue Feb 18 09:51:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Feb 2025 09:51:15 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 09:14:28 GMT, Roland Westrelin wrote: >> Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below. >> >> **Background** >> >> With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer. >> >> **Problem** >> >> So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code. >> >> >> MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1); >> MemorySegment nativeUnaligned = nativeAligned.asSlice(1); >> test3(nativeUnaligned); >> >> >> When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not! >> >> static void test3(MemorySegment ms) { >> for (int i = 0; i < RANGE; i++) { >> long adr = i * 4L; >> int v = ms.get(ELEMENT_LAYOUT, adr); >> ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1)); >> } >> } >> >> >> **Solution: Runtime Checks - Predicate and Multiversioning** >> >> Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check. >> >> I came up with 2 options where to place the runtime checks: >> - A new "auto vectorization" Parse Predicate: >> - This only works when predicates are available. >> - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop. >> - Multiversion the loop: >> - Create 2 copies of the loop (fast and slow loops). >> - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take >> - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even ... > > src/hotspot/share/opto/loopnode.cpp line 1097: > >> 1095: // PhaseIdealLoop::add_parse_predicate only checks trap limits per method, so >> 1096: // we do a custom check here. >> 1097: if (!C->too_many_traps(cloned_sfpt->jvms()->method(), cloned_sfpt->jvms()->bci(), Deoptimization::Reason_auto_vectorization_check)) { > > Isn't that done by `add_parse_predicate`? @rwestrel I only see `if (!C->too_many_traps(reason)) {` in `PhaseIdealLoop::add_parse_predicate`. And as the comment I put here that only checks the `reason` per `method`, and not per `bci`. Do you see anything else? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959403871 From epeter at openjdk.org Tue Feb 18 09:56:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Feb 2025 09:56:13 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 09:32:19 GMT, Roland Westrelin wrote: > Would it make sense to add verification code that makes sure that whenever a loop is flagged as multi version, c2 can find the multi version guard (and maybe whenever there's a multi version guard, loops that are guarded are indeed flagged correctly)? I'd have to see if that is possible. Well: > verification code that makes sure that whenever a loop is flagged as multi version, c2 can find the multi version guard That is maybe possible. At least I cannot think of a reason why it should not work right now. Well, maybe what if the predicates get messed up somehow, is that possible? Then you would lose connection. Ah: what if the pre-loop somehow gets "messed up", i.e. that it loses its loop structure. Then we could not really go from the main-loop to the pre-loop to the selector-if any more. > whenever there's a multi version guard, loops that are guarded are indeed flagged correctly That one is more tricky. Because what if the loop somehow gets folded away? How would we catch that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2665123097 From roland at openjdk.org Tue Feb 18 09:56:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 18 Feb 2025 09:56:14 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 09:48:58 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopnode.cpp line 1097: >> >>> 1095: // PhaseIdealLoop::add_parse_predicate only checks trap limits per method, so >>> 1096: // we do a custom check here. >>> 1097: if (!C->too_many_traps(cloned_sfpt->jvms()->method(), cloned_sfpt->jvms()->bci(), Deoptimization::Reason_auto_vectorization_check)) { >> >> Isn't that done by `add_parse_predicate`? > > @rwestrel I only see `if (!C->too_many_traps(reason)) {` in `PhaseIdealLoop::add_parse_predicate`. And as the comment I put here that only checks the `reason` per `method`, and not per `bci`. Do you see anything else? Seems like it's a bug that `PhaseIdealLoop::add_parse_predicate` doesn't check the `bci` too. Could you fix it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959411405 From epeter at openjdk.org Tue Feb 18 10:07:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Feb 2025 10:07:18 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 09:53:14 GMT, Roland Westrelin wrote: >> @rwestrel I only see `if (!C->too_many_traps(reason)) {` in `PhaseIdealLoop::add_parse_predicate`. And as the comment I put here that only checks the `reason` per `method`, and not per `bci`. Do you see anything else? > > Seems like it's a bug that `PhaseIdealLoop::add_parse_predicate` doesn't check the `bci` too. Could you fix it? @rwestrel So we would check both, right? But is that what we want for all predicates? `C->too_many_traps(reason)` checks against `PerMethodTrapLimit`: if (trap_count(reason) >= Deoptimization::per_method_trap_limit(reason)) { But the `bci` check works with `PerBytecodeTrapLimit`, and it actually has a comment like this: if (md->has_trap_at(bci, m, reason) != 0) { // Assume PerBytecodeTrapLimit==0, for a more conservative heuristic. // Also, if there are multiple reasons, or if there is no per-BCI record, // assume the worst. So the `bci` check fails if there has been even a single trapping recorded. So it seems that such a change would affect the behavior in ways I cannot yet predict. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959431345 From galder at openjdk.org Tue Feb 18 10:09:18 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 18 Feb 2025 10:09:18 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> <5oGMaD5b87inAMkco6l5ODRvWv7FRsHGJiu_UMrGrTc=.0be44429-d322-4a6f-b91d-b64a146fad05@github.com> <3ArmrOQcUoj8DhHTq1a40Oz3GE8bCDDy3FF eVgbladg=.b8e0e13b-39f3-41a6-8a1b-5ca4febb4a41@github.com> Message-ID: <50uPQ3ue90Xr_LSEm8z3XLTL1yx2A-Q0SJ8rdmv-gsg=.960a6c31-9850-4ce3-bd88-41d4342a5605@github.com> On Tue, 18 Feb 2025 09:21:46 GMT, Galder Zamarre?o wrote: > For the equivalent long tests I think I made a mistake in the id of the disabled intrinsic, it should be _maxL and not _max. I will repeat the tests and post if any similar differences observed. FYI Indeed a similar pattern is observed for long min/max (with the patch in this PR): make test TEST="micro:org.openjdk.bench.java.lang.MinMaxVector.longReductionSimpleMax" MICRO="FORK=1" Benchmark (probability) (size) Mode Cnt Score Error Units MinMaxVector.longReductionSimpleMax 50 2048 thrpt 4 460.392 ? 0.076 ops/ms MinMaxVector.longReductionSimpleMax 80 2048 thrpt 4 460.459 ? 0.438 ops/ms MinMaxVector.longReductionSimpleMax 100 2048 thrpt 4 460.469 ? 0.057 ops/ms make test TEST="micro:org.openjdk.bench.java.lang.MinMaxVector.longReductionSimpleMax" MICRO="FORK=1;OPTIONS=-jvmArgs -XX:CompileCommand=option,org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionSimpleMax_jmhTest::longReductionSimpleMax_thrpt_jmhStub,ccstrlist,DisableIntrinsic,_maxL" Benchmark (probability) (size) Mode Cnt Score Error Units MinMaxVector.longReductionSimpleMax 50 2048 thrpt 4 460.453 ? 0.188 ops/ms MinMaxVector.longReductionSimpleMax 80 2048 thrpt 4 460.507 ? 0.192 ops/ms MinMaxVector.longReductionSimpleMax 100 2048 thrpt 4 1013.498 ? 1.607 ops/ms make test TEST="micro:org.openjdk.bench.java.lang.MinMaxVector.longReductionMultiplyMax" MICRO="FORK=1" Benchmark (probability) (size) Mode Cnt Score Error Units MinMaxVector.longReductionMultiplyMax 50 2048 thrpt 4 966.429 ? 0.359 ops/ms MinMaxVector.longReductionMultiplyMax 80 2048 thrpt 4 966.569 ? 0.338 ops/ms MinMaxVector.longReductionMultiplyMax 100 2048 thrpt 4 966.548 ? 0.575 ops/ms make test TEST="micro:org.openjdk.bench.java.lang.MinMaxVector.longReductionMultiplyMax" MICRO="FORK=1;OPTIONS=-jvmArgs -XX:CompileCommand=option,org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMultiplyMax_jmhTest::longReductionMultiplyMax_thrpt_jmhStub,ccstrlist,DisableIntrinsic,_maxL" Benchmark (probability) (size) Mode Cnt Score Error Units MinMaxVector.longReductionMultiplyMax 50 2048 thrpt 4 966.130 ? 5.549 ops/ms MinMaxVector.longReductionMultiplyMax 80 2048 thrpt 4 966.380 ? 0.663 ops/ms MinMaxVector.longReductionMultiplyMax 100 2048 thrpt 4 859.233 ? 7.817 ops/ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2665159015 From rcastanedalo at openjdk.org Tue Feb 18 10:10:13 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 18 Feb 2025 10:10:13 GMT Subject: RFR: 8344009: Improve compiler memory statistics In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 08:55:26 GMT, Roberto Casta?eda Lozano wrote: > Hi Thomas, this looks very useful, thanks! I will run some Oracle-internal functional and performance testing and come back with the results next week. Functional test results (Oracle internal tier1-tier5) look good. I measured C2 execution time before and after the changeset using DaCapo 23 and did not find any statistically significant difference, except for a 2-3% regression on the jython benchmark (using large input size). This small regression is IMO acceptable, particularly given that these changes can be seen as an investment to improve compiler resource utilization in the long run. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2665162403 From epeter at openjdk.org Tue Feb 18 10:11:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Feb 2025 10:11:16 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> Message-ID: On Tue, 18 Feb 2025 09:57:29 GMT, Roland Westrelin wrote: > > That one is more tricky. Because what if the loop somehow gets folded away? How would we catch that? >There is code that removes the OuterStripMinedLoop if the CountedLoop goes away and also, if I recall correctly, logic that verifies no ``OuterStripMinedLoopis left behind without aCountedLoop` so it's probably possible. Question is whether we want that or not. Seems like quite a bit of extra complexity. Hmm ok, I see. I wonder how bad it is to leave the slow-loop there until after loop-opts. I mean it was already created, and it now has no loop-opts performed on it (it is stalled), so it just sits there like dead code. So I'm not sure there is really a performance benefit to kill it already a little earlier. Maybe a very small one? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2665161507 From roland at openjdk.org Tue Feb 18 10:11:17 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 18 Feb 2025 10:11:17 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: <-h_j1wlUqiWpk7lHDe2qqLlTPUdRLJ2NBaid6KJURCQ=.e1ef0bfa-4043-42b0-be58-ac130373c788@github.com> On Tue, 18 Feb 2025 10:04:59 GMT, Emanuel Peter wrote: >> Seems like it's a bug that `PhaseIdealLoop::add_parse_predicate` doesn't check the `bci` too. Could you fix it? > > @rwestrel So we would check both, right? But is that what we want for all predicates? > > `C->too_many_traps(reason)` checks against `PerMethodTrapLimit`: > > if (trap_count(reason) >= Deoptimization::per_method_trap_limit(reason)) { > > > But the `bci` check works with `PerBytecodeTrapLimit`, and it actually has a comment like this: > > if (md->has_trap_at(bci, m, reason) != 0) { > // Assume PerBytecodeTrapLimit==0, for a more conservative heuristic. > // Also, if there are multiple reasons, or if there is no per-BCI record, > // assume the worst. > > So the `bci` check fails if there has been even a single trapping recorded. > > So it seems that such a change would affect the behavior in ways I cannot yet predict. > > What do you think? That code is supposed to mirror the `GraphKit::add_parse_predicate()`. It doesn't. Would you like me to fix this separately? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959437628 From epeter at openjdk.org Tue Feb 18 10:20:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Feb 2025 10:20:15 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: <-h_j1wlUqiWpk7lHDe2qqLlTPUdRLJ2NBaid6KJURCQ=.e1ef0bfa-4043-42b0-be58-ac130373c788@github.com> References: <-h_j1wlUqiWpk7lHDe2qqLlTPUdRLJ2NBaid6KJURCQ=.e1ef0bfa-4043-42b0-be58-ac130373c788@github.com> Message-ID: On Tue, 18 Feb 2025 10:09:00 GMT, Roland Westrelin wrote: > That code Which code are you referring to? Ah, probably you are talking about `PhaseIdealLoop::add_parse_predicate`, which is using the method wide check. And `GraphKit::add_parse_predicate` actually queries `GraphKit::too_many_traps`, which knows the current `bci()`, and can query the per-bci count. > Would you like me to fix this separately? Yes, please. I definitely don't want to do it in this PR ;) And I don't have as much experience with traps as you do. We'd have to think a little about what cases this affects, and if performance would go up or down in all those cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959451204 From epeter at openjdk.org Tue Feb 18 10:29:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Feb 2025 10:29:12 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: <-h_j1wlUqiWpk7lHDe2qqLlTPUdRLJ2NBaid6KJURCQ=.e1ef0bfa-4043-42b0-be58-ac130373c788@github.com> References: <-h_j1wlUqiWpk7lHDe2qqLlTPUdRLJ2NBaid6KJURCQ=.e1ef0bfa-4043-42b0-be58-ac130373c788@github.com> Message-ID: On Tue, 18 Feb 2025 10:09:00 GMT, Roland Westrelin wrote: >> @rwestrel So we would check both, right? But is that what we want for all predicates? >> >> `C->too_many_traps(reason)` checks against `PerMethodTrapLimit`: >> >> if (trap_count(reason) >= Deoptimization::per_method_trap_limit(reason)) { >> >> >> But the `bci` check works with `PerBytecodeTrapLimit`, and it actually has a comment like this: >> >> if (md->has_trap_at(bci, m, reason) != 0) { >> // Assume PerBytecodeTrapLimit==0, for a more conservative heuristic. >> // Also, if there are multiple reasons, or if there is no per-BCI record, >> // assume the worst. >> >> So the `bci` check fails if there has been even a single trapping recorded. >> >> So it seems that such a change would affect the behavior in ways I cannot yet predict. >> >> What do you think? > > That code is supposed to mirror the `GraphKit::add_parse_predicate()`. It doesn't. Would you like me to fix this separately? @rwestrel do you consider that a blocking issue for this PR here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959463556 From roland at openjdk.org Tue Feb 18 10:29:13 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 18 Feb 2025 10:29:13 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: <-h_j1wlUqiWpk7lHDe2qqLlTPUdRLJ2NBaid6KJURCQ=.e1ef0bfa-4043-42b0-be58-ac130373c788@github.com> Message-ID: On Tue, 18 Feb 2025 10:25:08 GMT, Emanuel Peter wrote: >> That code is supposed to mirror the `GraphKit::add_parse_predicate()`. It doesn't. Would you like me to fix this separately? > > @rwestrel do you consider that a blocking issue for this PR here? No ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1959465988 From mdoerr at openjdk.org Tue Feb 18 11:15:19 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 18 Feb 2025 11:15:19 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v23] In-Reply-To: <5xeRqXJYlOXFs4jAAXJaf_i0Vn7phluw1j-rNPvZakc=.5c60cc39-28b4-4808-9a1c-8a4e318cd5ed@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> <5xeRqXJYlOXFs4jAAXJaf_i0Vn7phluw1j-rNPvZakc=.5c60cc39-28b4-4808-9a1c-8a4e318cd5ed@github.com> Message-ID: On Mon, 17 Feb 2025 14:05:12 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > Single load inside loop src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 562: > 560: VectorRegister vTmp4, VectorRegister vTmp5, VectorRegister vTmp6, > 561: VectorRegister vTmp7, VectorRegister vTmp8, VectorRegister vTmp9, > 562: VectorRegister vTmp10, VectorRegister vTmp11, Register data) { `data` is no longer needed. src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 714: > 712: __ vec_perm(vTmp5, vLow, vLow, loadOrder); > 713: __ vec_perm(vH, vTmp5, vTmp4, vPerm); > 714: __ subi(data, data, 16); sub and add below are no longer needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1959533373 PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1959533794 From aph at openjdk.org Tue Feb 18 11:35:10 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 18 Feb 2025 11:35:10 GMT Subject: RFR: 8350182: [s390x] Relativize locals in interpreter frames In-Reply-To: References: Message-ID: On Mon, 17 Feb 2025 09:53:37 GMT, Amit Kumar wrote: > Port for [JDK-8299795](https://bugs.openjdk.org/browse/JDK-8299795) Relativize Z_locals in interpreter frame for s390x. > > Tier1 test with fastdebug vm are clean. src/hotspot/cpu/s390/frame_s390.hpp line 336: > 334: #define _z_ijava_idx(_component) \ > 335: (_z_ijava_state_neg(_component) >> LogBytesPerWord) > 336: Should this be a function? Also, names starting with `_` aren't common in HotSpot code, except for fields in C++ objects. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23660#discussion_r1959561970 From amitkumar at openjdk.org Tue Feb 18 11:40:12 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 18 Feb 2025 11:40:12 GMT Subject: RFR: 8350182: [s390x] Relativize locals in interpreter frames In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 11:32:50 GMT, Andrew Haley wrote: > Should this be a function? This is similar to [ppc ijava_idx](https://github.com/openjdk/jdk/blob/885be2efa6b1359a7c7ab36882e19a7eaba77fb3/src/hotspot/cpu/ppc/frame_ppc.hpp#L283C1-L285C58) > Also, names starting with _ aren't common in HotSpot code, except for fields in C++ objects. I did it for keeping the parity with: #define _z_ijava_state_neg(_component) \ (int) (-frame::z_ijava_state_size + offset_of(frame::z_ijava_state, _component)) Do you want me to revert it ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23660#discussion_r1959568305 From aph at openjdk.org Tue Feb 18 11:55:19 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 18 Feb 2025 11:55:19 GMT Subject: RFR: 8350182: [s390x] Relativize locals in interpreter frames In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 11:37:27 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/frame_s390.hpp line 336: >> >>> 334: #define _z_ijava_idx(_component) \ >>> 335: (_z_ijava_state_neg(_component) >> LogBytesPerWord) >>> 336: >> >> Should this be a function? >> Also, names starting with `_` aren't common in HotSpot code, except for fields in C++ objects. > >> Should this be a function? > > This is similar to [ppc ijava_idx](https://github.com/openjdk/jdk/blob/885be2efa6b1359a7c7ab36882e19a7eaba77fb3/src/hotspot/cpu/ppc/frame_ppc.hpp#L283C1-L285C58) > >> Also, names starting with _ aren't common in HotSpot code, except for fields in C++ objects. > > I did it for keeping the parity with: > > #define _z_ijava_state_neg(_component) \ > (int) (-frame::z_ijava_state_size + offset_of(frame::z_ijava_state, _component)) > > Do you want me to revert it ? There is no good reason to use a macro here. If a function can be a function, and this one can, let it be one. However, there is no reason to change anything else. Leave that for a "macros to functions" patch some other day. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23660#discussion_r1959589863 From adinn at openjdk.org Tue Feb 18 13:36:27 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 18 Feb 2025 13:36:27 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: Message-ID: <3kiI1J7jcczgzTRi9HZztzhGe1blcy8Ga11xoGhzueY=.98543172-5b38-4199-bead-0988de0e0e75@github.com> On Thu, 6 Feb 2025 18:47:54 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Adding comments + some code reorganization src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2594: > 2592: guarantee(T != T1Q && T != T1D, "incorrect arrangement"); \ > 2593: if (!acceptT2D) guarantee(T != T2D, "incorrect arrangement"); \ > 2594: if (strcmp(#NAME, "sqdmulh") == 0) guarantee(T != T8B && T != T16B, "incorrect arrangement"); \ Suggestion: I think it might be better to change this test from a strcmp call to (opc2 == 0b101101). The strcmp test is clearer to a reader of the code but the call may not be guaranteed to be compiled out at build time while the latter will. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1959758334 From adinn at openjdk.org Tue Feb 18 13:46:17 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 18 Feb 2025 13:46:17 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 18:47:54 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Adding comments + some code reorganization src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4066: > 4064: } > 4065: > 4066: // Execute on round of keccak of two computations in parallel. Suggestion: It would be helpful to add comments that relate the register and instruction selection to the original Java source code. e.g. change the header as follows // Performs 2 keccak round transformations using vector parallelism // // Two sets of 25 * 64-bit input states a0[lo:hi]...a24[lo:hi] are passed in // the lower/upper halves of registers v0...v24 and the transformed states // are returned in the same registers. Intermediate 64-bit pairs // c0...c5 and d0...d5 are computed in registers v25...v30. v31 is // loaded with the required pair of 64 bit rounding constants. // During computation of the output states some intermediate results are // shuffled around registers v0...v30. Comments on each line indicate // how the values in registers correspond to variables ai, ci, di in // the Java source code, likewise how the generated machine instructions // correspond to Java source operations (n.b. rol means rotate left). The annotate the generation steps as follows: __ eor3(v29, __ T16B, v4, v9, v14); // c4 = a4 ^ a9 ^ a14 __ eor3(v26, __ T16B, v1, v6, v11); // c1 = a1 ^ a16 ^ a11 __ eor3(v28, __ T16B, v3, v8, v13); // c3 = a3 ^ a8 ^a13 __ eor3(v25, __ T16B, v0, v5, v10); // c0 = a0 ^ a5 ^ a10 __ eor3(v27, __ T16B, v2, v7, v12); // c2 = a2 ^ a7 ^ a12 __ eor3(v29, __ T16B, v29, v19, v24); // c4 ^= a19 ^ a24 __ eor3(v26, __ T16B, v26, v16, v21); // c1 ^= a16 ^ a21 __ eor3(v28, __ T16B, v28, v18, v23); // c3 ^= a18 ^ a23 __ eor3(v25, __ T16B, v25, v15, v20); // c0 ^= a15 ^ a20 __ eor3(v27, __ T16B, v27, v17, v22); // c2 ^= a17 ^ a22 __ rax1(v30, __ T2D, v29, v26); // d0 = c4 ^ rol(c1, 1) __ rax1(v26, __ T2D, v26, v28); // d2 = c1 ^ rol(c3, 1) __ rax1(v28, __ T2D, v28, v25); // d4 = c3 ^ rol(c0, 1) __ rax1(v25, __ T2D, v25, v27); // d1 = c0 ^ rol(c2, 1) __ rax1(v27, __ T2D, v27, v29); // d3 = c2 ^ rol(c4, 1) __ eor(v0, __ T16B, v0, v30); // a0 = a0 ^ d0 __ xar(v29, __ T2D, v1, v25, (64 - 1)); // a10' = rol((a1^d1), 1) __ xar(v1, __ T2D, v6, v25, (64 - 44)); // a1 = rol(a6^d1), 44) __ xar(v6, __ T2D, v9, v28, (64 - 20)); // a6 = rol((a9^d4), 20) __ xar(v9, __ T2D, v22, v26, (64 - 61)); // a9 = rol((a22^d2), 61) __ xar(v22, __ T2D, v14, v28, (64 - 39)); // a22 = rol((a14^d4), 39) __ xar(v14, __ T2D, v20, v30, (64 - 18)); // a14 = rol((a20^d0), 18) __ xar(v31, __ T2D, v2, v26, (64 - 62)); // a20' = rol((a2^d2), 62) __ xar(v2, __ T2D, v12, v26, (64 - 43)); // a2 = rol((a12^d2), 43) __ xar(v12, __ T2D, v13, v27, (64 - 25)); // a12 = rol((a13^d3), 25) __ xar(v13, __ T2D, v19, v28, (64 - 8)); // a13 = rol((a19^d4), 8) __ xar(v19, __ T2D, v23, v27, (64 - 56)); // a19 = rol((a23^d3), 56) __ xar(v23, __ T2D, v15, v30, (64 - 41)); // a23 = rol((a15^d0), 41) __ xar(v15, __ T2D, v4, v28, (64 - 27)); // a15 = rol((a4^d4), 27) __ xar(v28, __ T2D, v24, v28, (64 - 14)); // a4' = rol((a24^d4), 14) __ xar(v24, __ T2D, v21, v25, (64 - 2)); // a24 = rol((a21^d1), 2) __ xar(v8, __ T2D, v8, v27, (64 - 55)); // a21' = rol((a8^d3), 55) __ xar(v4, __ T2D, v16, v25, (64 - 45)); // a8' = rol((a16^d1), 45) __ xar(v16, __ T2D, v5, v30, (64 - 36)); // a16 = rol((a5^d0), 36) __ xar(v5, __ T2D, v3, v27, (64 - 28)); // a5 = rol((a3^d3), 28) __ xar(v27, __ T2D, v18, v27, (64 - 21)); // a3' = rol((a18^d3), 21) __ xar(v3, __ T2D, v17, v26, (64 - 15)); // a18' = rol((a17^d2), 15) __ xar(v25, __ T2D, v11, v25, (64 - 10)); // a17' = rol((a11^d1), 10) __ xar(v26, __ T2D, v7, v26, (64 - 6)); // a11' = rol((a7^d2), 6) __ xar(v30, __ T2D, v10, v30, (64 - 3)); // a7' = rol((a10^d0), 3) __ bcax(v20, __ T16B, v31, v22, v8); // a20 = a20' ^ (~a21 & a22') __ bcax(v21, __ T16B, v8, v23, v22); // a21 = a21' ^ (~a22 & a23) __ bcax(v22, __ T16B, v22, v24, v23); // a22 = a22 ^ (~a23 & a24) __ bcax(v23, __ T16B, v23, v31, v24); // a23 = a23 ^ (~a24 & a20') __ bcax(v24, __ T16B, v24, v8, v31); // a24 = a24 ^ (~a20' & a21') __ ld1r(v31, __ T2D, __ post(rscratch1, 8)); // rc = round_constants[i] __ bcax(v17, __ T16B, v25, v19, v3); // a17 = a17' ^ (~a18' & a19) __ bcax(v18, __ T16B, v3, v15, v19); // a18 = a18' ^ (~a19 & a15') __ bcax(v19, __ T16B, v19, v16, v15); // a19 = a19 ^ (~a15 & a16) __ bcax(v15, __ T16B, v15, v25, v16); // a15 = a15 ^ (~a16 & a17') __ bcax(v16, __ T16B, v16, v3, v25); // a16 = a16 ^ (~a17' & a18') __ bcax(v10, __ T16B, v29, v12, v26); // a10 = a10' ^ (~a11' & a12) __ bcax(v11, __ T16B, v26, v13, v12); // a11 = a11' ^ (~a12 & a13) __ bcax(v12, __ T16B, v12, v14, v13); // a12 = a12 ^ (~a13 & a14) __ bcax(v13, __ T16B, v13, v29, v14); // a13 = a13 ^ (~a14 & a10') __ bcax(v14, __ T16B, v14, v26, v29); // a14 = a14 ^ (~a10' & a11') __ bcax(v7, __ T16B, v30, v9, v4); // a7 = a7' ^ (~a8' & a9) __ bcax(v8, __ T16B, v4, v5, v9); // a8 = a8' ^ (~a9 & a5) __ bcax(v9, __ T16B, v9, v6, v5); // a9 = a9 ^ (~a5 & a6) __ bcax(v5, __ T16B, v5, v30, v6); // a5 = a5 ^ (~a6 & a7) __ bcax(v6, __ T16B, v6, v4, v30); // a6 = a6 ^ (~a7 & a8') __ bcax(v3, __ T16B, v27, v0, v28); // a3 = a3' ^ (~a4' & a0) __ bcax(v4, __ T16B, v28, v1, v0); // a4 = a4' ^ (~a0 & a1) __ bcax(v0, __ T16B, v0, v2, v1); // a0 = a0 ^ (~a1 & a2) __ bcax(v1, __ T16B, v1, v27, v2); // a1 = a1 ^ (~a2 & a3) __ bcax(v2, __ T16B, v2, v28, v27); // a2 = a2 ^ (~a3 & a4') __ eor(v0, __ T16B, v0, v31); // a0 = a0 ^ rc ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1959776475 From mbaesken at openjdk.org Tue Feb 18 13:48:22 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 18 Feb 2025 13:48:22 GMT Subject: RFR: 8345265: Minor improvements for LTO across all compilers [v2] In-Reply-To: <3peOk4hOWRVX3sn5BHQbRh5ymyP8Sr146H66jDWkePA=.ef3d0788-2bfa-421b-ad92-a1e46fd0feb5@github.com> References: <2y8p-J2SCTANChv8WvrXmYI1UjVxbC7n8tUJzBOMzEE=.7c2b48a5-423e-4138-8671-3037e8963730@github.com> <3peOk4hOWRVX3sn5BHQbRh5ymyP8Sr146H66jDWkePA=.ef3d0788-2bfa-421b-ad92-a1e46fd0feb5@github.com> Message-ID: On Fri, 17 Jan 2025 13:50:11 GMT, Matthias Baesken wrote: >>> > Member >>> >>> Fixing the JVM under LTO is likely to be a heavy undertaking, much more so than just unbreaking compilation and linking of the JVM (Ignoring that the JVM later crashes when the newly compiled JDK is used to build parts of itself), I'm not sure it would be feasible under the current Pull Request >> >> I was able to build the OpenJDK with LTO enabled on Linux and Windows (so the new JVM does not crash in the build). I just had to not enable gtest because this is currently not compiling with LTO enabled. I was able to run a few benchmarks with the LTO enabled JVM , but as far as I remember a couple of HS jtreg tests fail with LTO enabled because they have some expectations that might not (yet) work with LTO. >> >> Regarding gtest, I created >> https://bugs.openjdk.org/browse/JDK-8346987 >> 8346987: [lto] gtest build fails >> Do you think it would be okay to change the build so that the LTO related flags (in case lto is enabled) do not 'go' into the gtest build ? > >> I was able to run a few benchmarks with the LTO enabled JVM , >> but as far as I remember a couple of HS jtreg tests fail with LTO enabled because they have some expectations that might not (yet) work with LTO > > On Linux x86_64 (gcc 11.3 devkit) , when building with lto enabled, the jdk :tier1 jtreg tests all worked nicely in my environment. > The HS :tier1 jtreg tests had 51 failures, 50 in the serviceability/sa area . > Those failures (from serviceability/sa) seem to have in common that they show such an exception > > stderr: [Exception in thread "main" java.lang.InternalError: Metadata does not appear to be polymorphic > at jdk.hotspot.agent/sun.jvm.hotspot.types.basic.BasicTypeDataBase.findDynamicTypeForAddress(BasicTypeDataBase.java:223) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VirtualBaseConstructor.instantiateWrapperFor(VirtualBaseConstructor.java:104) > at jdk.hotspot.agent/sun.jvm.hotspot.oops.Metadata.instantiateWrapperFor(Metadata.java:78) > at jdk.hotspot.agent/sun.jvm.hotspot.oops.MetadataField.getValue(MetadataField.java:43) > at jdk.hotspot.agent/sun.jvm.hotspot.oops.MetadataField.getValue(MetadataField.java:40) > at jdk.hotspot.agent/sun.jvm.hotspot.classfile.ClassLoaderData.getKlasses(ClassLoaderData.java:82) > at jdk.hotspot.agent/sun.jvm.hotspot.classfile.ClassLoaderData.classesDo(ClassLoaderData.java:101) > at jdk.hotspot.agent/sun.jvm.hotspot.classfile.ClassLoaderDataGraph.classesDo(ClassLoaderDataGraph.java:84) > at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor$19.doit(CommandProcessor.java:926) > at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2230) > at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2200) > at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.run(CommandProcessor.java:2071) > at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.run(CLHSDB.java:112) > at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.main(CLHSDB.java:44) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runCLHSDB(SALauncher.java:285) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:507) > > or test serviceability/sa/TestJhsdbJstackMixed.java > > stderr: [java.lang.InternalError: Metadata does not appear to be polymorphic > at jdk.hotspot.agent/sun.jvm.hotspot.types.basic.BasicTypeDataBase.findDynamicTypeForAddress(BasicTypeDataBase.java:223) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VirtualBaseConstructor.instantiateWrapperFor(VirtualBaseConstructor.java:1... > @MBaesken Currently with LTO active on gcc 14 commit [e648a90](https://github.com/openjdk/jdk/commit/e648a907b31fd0d6b746d149fda2a8d5fbe26dc0) is causing serious trouble on my end by mass inlining everything, bloating the JVM to nearly 60MB in size, does HotSpot have the same size issues on your end with LTO? (--enable-jvm-feature-opt-size is off the table because the JVM should ideally be an acceptable size even without that flag, and -Os and LTO doesn't work with gcc anyway) On my end we used gcc11 in the past and now test gcc13. Both work nicely, no libjvm.so bloat has been observed with lto. Maybe there is some issue/difference with gcc14 but so far we did not test with this version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22464#issuecomment-2665762834 From kvn at openjdk.org Tue Feb 18 19:26:04 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Feb 2025 19:26:04 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> Message-ID: On Tue, 18 Feb 2025 10:07:07 GMT, Emanuel Peter wrote: >>> That one is more tricky. Because what if the loop somehow gets folded away? How would we catch that? >> >> There is code that removes the `OuterStripMinedLoop` if the `CountedLoop` goes away and also, if I recall correctly, logic that verifies no ``OuterStripMinedLoop` is left behind without a `CountedLoop` so it's probably possible. Question is whether we want that or not. Seems like quite a bit of extra complexity. > >> > That one is more tricky. Because what if the loop somehow gets folded away? How would we catch that? > >>There is code that removes the OuterStripMinedLoop if the CountedLoop goes away and also, if I recall correctly, logic that verifies no ``OuterStripMinedLoopis left behind without aCountedLoop` so it's probably possible. Question is whether we want that or not. Seems like quite a bit of extra complexity. > > Hmm ok, I see. I wonder how bad it is to leave the slow-loop there until after loop-opts. I mean it was already created, and it now has no loop-opts performed on it (it is stalled), so it just sits there like dead code. So I'm not sure there is really a performance benefit to kill it already a little earlier. Maybe a very small one? @eme64, my main concern is loop multi versions code will blowup inlining decisions. Our benchmarks may not be affected because we nay never trigger multi versions code on our hardware (as Roland pointed). May be you can force its generation and then compare performance. Do we really need it for this changes? Can we simply generate un-vectorized loop? " x86 and aarch64 are unaffected". Which platforms are affected? Do we really should sacrifice code complexity for platforms we don't support? An other question is what deoptimization `Action` is taken when predicate is failed? I saw comment in code "We only want to use the auto-vectorization check as a trap once per bci." Does it mean you immediately deoptimize code? Can we hit uncommon trap few times before deoptimization? Deoptimization after one trap assumes we will process the same un-aligned data again. In a test it could be true but in reality is it true too? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2666176147 From jiangli at openjdk.org Tue Feb 18 19:26:09 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 18 Feb 2025 19:26:09 GMT Subject: RFR: 8349620: Add VMProps for static JDK [v2] In-Reply-To: References: Message-ID: > Please review this change that adds the `jdk.static` VMProps. It can be used to skip tests not for running on static JDK. > > This also adds a new WhiteBox native method, `jdk.test.whitebox.WhiteBox.isStatic()`, which is used by VMProps to determine if it's static at runtime. > > `@requires !jdk.static` is added in `test/hotspot/jtreg/runtime/modules/ModulesSymLink.java` to skip running the test on static JDK. This test uses `bin/jlink`, which is not provided on static JDK. There are other tests that require tools in `bin/`. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. Jiangli Zhou has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8349620 - - Add 'jdk.static' in VMProps. It can be used to skip tests not for running on static JDK. - Add WhiteBox isStatic() native method. It's used by VMProps to determine of it's static at runtime. - Add in '@requires !jdk.static' in test/hotspot/jtreg/runtime/modules/ModulesSymLink.java to skip the test on static JDK since it requires bin/jlink. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23528/files - new: https://git.openjdk.org/jdk/pull/23528/files/e01ce5b1..e4d2e49e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23528&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23528&range=00-01 Stats: 19883 lines in 961 files changed: 13314 ins; 3171 del; 3398 mod Patch: https://git.openjdk.org/jdk/pull/23528.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23528/head:pull/23528 PR: https://git.openjdk.org/jdk/pull/23528 From alanb at openjdk.org Tue Feb 18 19:26:13 2025 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 18 Feb 2025 19:26:13 GMT Subject: RFR: 8349620: Add VMProps for static JDK [v2] In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 19:09:00 GMT, Jiangli Zhou wrote: >> Please review this change that adds the `jdk.static` VMProps. It can be used to skip tests not for running on static JDK. >> >> This also adds a new WhiteBox native method, `jdk.test.whitebox.WhiteBox.isStatic()`, which is used by VMProps to determine if it's static at runtime. >> >> `@requires !jdk.static` is added in `test/hotspot/jtreg/runtime/modules/ModulesSymLink.java` to skip running the test on static JDK. This test uses `bin/jlink`, which is not provided on static JDK. There are other tests that require tools in `bin/`. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. > > Jiangli Zhou has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8349620 > - - Add 'jdk.static' in VMProps. It can be used to skip tests not for running on static JDK. > - Add WhiteBox isStatic() native method. It's used by VMProps to determine of it's static at runtime. > - Add in '@requires !jdk.static' in test/hotspot/jtreg/runtime/modules/ModulesSymLink.java to skip the test on static JDK since it requires bin/jlink. Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23528#pullrequestreview-2624487575 From jiangli at openjdk.org Tue Feb 18 19:26:17 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 18 Feb 2025 19:26:17 GMT Subject: RFR: 8349620: Add VMProps for static JDK In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 08:10:24 GMT, Alan Bateman wrote: >> I think this looks okay, I'm just wondering is one property is enough to cover all the configurations. > >> Thanks, @AlanBateman. >> >> > I'm just wondering is one property is enough to cover all the configurations. >> >> +1 >> >> It's not easy to predict all different cases for now. How about adding/refining when we find any new cases? > > That's okay with me. I'm hoping Magnus will jump in when he gets a chance as he has experience with the "other" static build configurations. Thanks, @AlanBateman! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2666666915 From kvn at openjdk.org Tue Feb 18 19:26:16 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Feb 2025 19:26:16 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 14:40:09 GMT, Emanuel Peter wrote: > Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below. > > **Background** > > With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer. > > **Problem** > > So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code. > > > MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1); > MemorySegment nativeUnaligned = nativeAligned.asSlice(1); > test3(nativeUnaligned); > > > When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not! > > static void test3(MemorySegment ms) { > for (int i = 0; i < RANGE; i++) { > long adr = i * 4L; > int v = ms.get(ELEMENT_LAYOUT, adr); > ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1)); > } > } > > > **Solution: Runtime Checks - Predicate and Multiversioning** > > Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check. > > I came up with 2 options where to place the runtime checks: > - A new "auto vectorization" Parse Predicate: > - This only works when predicates are available. > - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop. > - Multiversion the loop: > - Create 2 copies of the loop (fast and slow loops). > - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take > - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even unaligned `base`s would end up with reasonably fast code. > - We "stall" the `... What probabilities for multi-version loops branches? Did non-vectorized version is move out of hot path in generated code? About actual probability value. I was thinking PROB_LIKELY_MAG(3). PROB_LIKELY_MAG(1) will only guarantee that vectorized loop will be first but it could be enough without moving other loop from hot path. Needs testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2666554240 PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2666710345 From kvn at openjdk.org Tue Feb 18 19:26:11 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Feb 2025 19:26:11 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> Message-ID: <7CUvxR76ROhB7TB2qqbF2nQB5RNIj4GpRvKqZSw-dDM=.8917fc6a-3e84-4a9b-8df7-2eec07cfa768@github.com> On Tue, 18 Feb 2025 17:20:23 GMT, Emanuel Peter wrote: > Do we really need it for this changes? Can we simply generate un-vectorized loop? To clarify. This question was about second phase after we deoptimize and recompile when hit predicate check failure. I am fine with predicate change. > And I really could not measure any difference in the performance benchmarking. I doubt it is even noticable on compile-time. Right. If a method has a vectorizable loop, it is most likely has big generated code and not inlined already. So adding 4th loop may not affected significantly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2666506254 PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2666525354 From epeter at openjdk.org Tue Feb 18 19:26:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Feb 2025 19:26:18 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 18:29:42 GMT, Vladimir Kozlov wrote: > What probabilities for multi-version loops branches? Did non-vectorized version is move out of hot path in generated code? I'm not sure what you are asking. Are you asking what probability I'm setting for the multi-version branch? This is the loop selector, which later gets copied for each of the checks. `const LoopSelector loop_selector(lpt, opaque, PROB_FAIR, COUNT_UNKNOWN);` So 50%. But maybe you are suggesting it should really be biased towards the fast-path, right? What probability would you suggest? It should probably be fairly low, since there can be multiple checks added, and each one lowers the probability of arriving at the true-loop. So for scheduling, we should keep the probability high, so the true-loop is scheduled closer, right? Is that what you meant? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2666602599 From kvn at openjdk.org Tue Feb 18 19:26:19 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Feb 2025 19:26:19 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 18:45:34 GMT, Emanuel Peter wrote: > > What probabilities for multi-version loops branches? Did non-vectorized version is move out of hot path in generated code? > > I'm not sure what you are asking. Are you asking what probability I'm setting for the multi-version branch? > > This is the loop selector, which later gets copied for each of the checks. `const LoopSelector loop_selector(lpt, opaque, PROB_FAIR, COUNT_UNKNOWN);` > > So 50%. But maybe you are suggesting it should really be biased towards the fast-path, right? What probability would you suggest? It should probably be fairly low, since there can be multiple checks added, and each one lowers the probability of arriving at the true-loop. So for scheduling, we should keep the probability high, so the true-loop is scheduled closer, right? > > Is that what you meant? Yes. I want prioritize fast path assuming it is vectorized loop and that we get aligned data more frequently. It is actually difficult to judge without statistic from real applications. It should be reversed if an application works mostly on unaligned data. Can we profile alignment in Interpreter (and C1)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2666635167 From epeter at openjdk.org Tue Feb 18 19:26:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 18 Feb 2025 19:26:07 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> Message-ID: On Tue, 18 Feb 2025 16:10:20 GMT, Vladimir Kozlov wrote: >>> > That one is more tricky. Because what if the loop somehow gets folded away? How would we catch that? >> >>>There is code that removes the OuterStripMinedLoop if the CountedLoop goes away and also, if I recall correctly, logic that verifies no ``OuterStripMinedLoopis left behind without aCountedLoop` so it's probably possible. Question is whether we want that or not. Seems like quite a bit of extra complexity. >> >> Hmm ok, I see. I wonder how bad it is to leave the slow-loop there until after loop-opts. I mean it was already created, and it now has no loop-opts performed on it (it is stalled), so it just sits there like dead code. So I'm not sure there is really a performance benefit to kill it already a little earlier. Maybe a very small one? > > @eme64, my main concern is loop multi versions code will blowup inlining decisions. Our benchmarks may not be affected because we nay never trigger multi versions code on our hardware (as Roland pointed). May be you can force its generation and then compare performance. Do we really need it for this changes? Can we simply generate un-vectorized loop? > > " x86 and aarch64 are unaffected". Which platforms are affected? Do we really should sacrifice code complexity for platforms we don't support? > > An other question is what deoptimization `Action` is taken when predicate is failed? I saw comment in code "We only want to use the auto-vectorization check as a trap once per bci." Does it mean you immediately deoptimize code? Can we hit uncommon trap few times before deoptimization? Deoptimization after one trap assumes we will process the same un-aligned data again. In a test it could be true but in reality is it true too? @vnkozlov > " x86 and aarch64 are unaffected". Which platforms are affected? Do we really should sacrifice code complexity for platforms we don't support? I would say most of the code here, i.e. the predicate and multi-version parts are also relevant for the up-coming patch for aliasing analysis runtime-checks. These are especially important for `MemorySegment` cases where there could basically always be aliasing and only runtime-checks can help us vectorize. There is really only a small part, which is emitting the actual alignment-check. > Do we really need it for this changes? Can we simply generate un-vectorized loop? The alternatives on architectures that are actually affected by this bug: - Not fix the bug, and risk possible `SIGBUS`. And on our platforms, that just means living with the HALT caused by `VerifyAlignVector`. - Disable ALL vectorization of cases where we cannot guarantee statically that accesses are aligned. That would certainly disable all uses of `MemorySegment`, and that is probably not preferrable. > my main concern is loop multi versions code will blowup inlining decisions. Our benchmarks may not be affected because we nay never trigger multi versions code on our hardware (as Roland pointed). May be you can force its generation and then compare performance. Right. I suppose code size might be slightly affected. But I only multi-version if we are already going to pre-main-post the loop. And that means that the loop is already copied 3x, and doing 4x is not that noticable I would suspect. Also, with OSR we already currently don't generate predicates, and so it is generating the multi-versioning for those. And I really could not measure any difference in the performance benchmarking. I doubt it is even noticable on compile-time. > An other question is what deoptimization Action is taken when predicate is failed? I saw comment in code "We only want to use the auto-vectorization check as a trap once per bci." Does it mean you immediately deoptimize code? Can we hit uncommon trap few times before deoptimization? Deoptimization after one trap assumes we will process the same un-aligned data again. In a test it could be true but in reality is it true too? Yes, when we deopt for the bci, we recompile immediately. The alternative is to make the check per method, but then the risk is that one loop deopting causes other loops to be multi-versioned instead of using predicates too. Counting deopts per bci is currently not done at all. But I suppose we could make it a bit more "forgiving"... but is that worth it? I suppose if in reallity we do see non-aligned cases (or in the future cases where we have problematic aliasing), then it will probably repeat, and is worth recompiling to handle both cases. But that is speculation, and we can discuss :) TLDR: @vnkozlov I would not have fixed the bug with such a heavy mechanism if I did not intend to use it for runtime-check for aliasing analysis. And 90% of the code here is reusable for that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2666357998 From jiangli at openjdk.org Tue Feb 18 19:29:10 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 18 Feb 2025 19:29:10 GMT Subject: RFR: 8349620: Add VMProps for static JDK [v3] In-Reply-To: References: Message-ID: > Please review this change that adds the `jdk.static` VMProps. It can be used to skip tests not for running on static JDK. > > This also adds a new WhiteBox native method, `jdk.test.whitebox.WhiteBox.isStatic()`, which is used by VMProps to determine if it's static at runtime. > > `@requires !jdk.static` is added in `test/hotspot/jtreg/runtime/modules/ModulesSymLink.java` to skip running the test on static JDK. This test uses `bin/jlink`, which is not provided on static JDK. There are other tests that require tools in `bin/`. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. Jiangli Zhou has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: - Add 'jdk.static' in VMProps. It can be used to skip tests not for running on static JDK. - Add WhiteBox isStatic() native method. It's used by VMProps to determine of it's static at runtime. - Add in '@requires !jdk.static' in test/hotspot/jtreg/runtime/modules/ModulesSymLink.java to skip the test on static JDK since it requires bin/jlink. ------------- Changes: https://git.openjdk.org/jdk/pull/23528/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23528&range=02 Stats: 19 lines in 6 files changed: 16 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23528.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23528/head:pull/23528 PR: https://git.openjdk.org/jdk/pull/23528 From aturbanov at openjdk.org Tue Feb 18 18:56:41 2025 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Tue, 18 Feb 2025 18:56:41 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v4] In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 18:31:42 GMT, Ioi Lam wrote: >> Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. >> >> With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. >> >> To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): >> >>> the format of the configuration and cache files is not specified and is subject to change without notice. >> >> **Notes for reviewers:** >> >> - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. >> - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. >> - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. >> - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. >> - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. >> >> **Misc Note** >> - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccb... > > Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: > > - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself > - Improve error message when AOTMode=create has an incompatible classpath make/test/SetupAot.java line 55: > 53: // E.g., use javac to compile a program. > 54: for (String tool : tools) { > 55: ToolProvider t = ToolProvider.findFirst(tool) Suggestion: ToolProvider t = ToolProvider.findFirst(tool) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1959971388 From jiangli at openjdk.org Tue Feb 18 19:37:15 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 18 Feb 2025 19:37:15 GMT Subject: RFR: 8349620: Add VMProps for static JDK [v3] In-Reply-To: References: Message-ID: <8Fq79TB_4TxUkCy25DGptRJ3Q7Brm6QZK1ZdJtyVRtI=.0fe9ff90-31b9-427e-9979-b03d8ea4f02d@github.com> On Tue, 18 Feb 2025 19:29:10 GMT, Jiangli Zhou wrote: >> Please review this change that adds the `jdk.static` VMProps. It can be used to skip tests not for running on static JDK. >> >> This also adds a new WhiteBox native method, `jdk.test.whitebox.WhiteBox.isStatic()`, which is used by VMProps to determine if it's static at runtime. >> >> `@requires !jdk.static` is added in `test/hotspot/jtreg/runtime/modules/ModulesSymLink.java` to skip running the test on static JDK. This test uses `bin/jlink`, which is not provided on static JDK. There are other tests that require tools in `bin/`. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. > > Jiangli Zhou has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. Can anyone help do a second review for this change? Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2666758082 From gziemski at openjdk.org Tue Feb 18 19:53:05 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 18 Feb 2025 19:53:05 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v48] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: use LD_FORMAT to hide formatting, allow NMT level mismatch only for time ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/3b595d8e..f5901f75 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=47 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=46-47 Stats: 82 lines in 2 files changed: 9 ins; 22 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From iveresov at openjdk.org Tue Feb 18 20:08:55 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 18 Feb 2025 20:08:55 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v4] In-Reply-To: References: Message-ID: <5PajiycaLuaG1-Ymqtc_OCMt0oWvlIhbozInYKmqQzc=.8a6d9baa-2197-4541-bdde-24f2eeaaf449@github.com> On Thu, 13 Feb 2025 18:31:42 GMT, Ioi Lam wrote: >> Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. >> >> With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. >> >> To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): >> >>> the format of the configuration and cache files is not specified and is subject to change without notice. >> >> **Notes for reviewers:** >> >> - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. >> - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. >> - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. >> - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. >> - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. >> >> **Misc Note** >> - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccb... > > Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: > > - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself > - Improve error message when AOTMode=create has an incompatible classpath Marked as reviewed by iveresov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23484#pullrequestreview-2624914709 From gziemski at openjdk.org Tue Feb 18 20:40:15 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 18 Feb 2025 20:40:15 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v49] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: add simple thread info ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/f5901f75..4a295517 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=48 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=47-48 Stats: 48 lines in 2 files changed: 37 ins; 10 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From ccheung at openjdk.org Tue Feb 18 22:39:02 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 18 Feb 2025 22:39:02 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v4] In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 18:31:42 GMT, Ioi Lam wrote: >> Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. >> >> With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. >> >> To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): >> >>> the format of the configuration and cache files is not specified and is subject to change without notice. >> >> **Notes for reviewers:** >> >> - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. >> - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. >> - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. >> - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. >> - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. >> >> **Misc Note** >> - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccb... > > Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: > > - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself > - Improve error message when AOTMode=create has an incompatible classpath I spotted several minor items. src/hotspot/share/cds/cdsConfig.cpp line 661: > 659: return UseSharedSpaces && is_using_full_module_graph() && _has_archived_invokedynamic; > 660: } > 661: Blank line removed by accident? src/hotspot/share/cds/cdsConfig.hpp line 138: > 136: static void stop_using_full_module_graph(const char* reason = nullptr) NOT_CDS_JAVA_HEAP_RETURN; > 137: > 138: Blank line revmoved by accident? src/hotspot/share/cds/filemap.cpp line 1389: > 1387: const char* file_type = CDSConfig::type_of_archive_being_loaded(); > 1388: if (_is_static) { > 1389: if ((gen_header->_magic == CDS_ARCHIVE_MAGIC) || Probably don't need the extra set of parentheses. src/hotspot/share/cds/finalImageRecipes.hpp line 67: > 65: > 66: public: > 67: static void serialize(SerializeClosure* soc, bool is_static_archive); The only caller is from `MetaspaceShared::serialize(`): `FinalImageRecipes::serialize(soc, true);` Wondering if the `is_static_archive` arg is needed? src/hotspot/share/cds/metaspaceShared.cpp line 819: > 817: CDSConfig::DumperThreadMark dumper_thread_mark(THREAD); > 818: ResourceMark rm(THREAD); > 819: HandleMark hm(THREAD); Why do we need HandleMark? src/hotspot/share/cds/metaspaceShared.cpp line 839: > 837: tty->print_cr("AOTConfiguration recorded: %s", AOTConfiguration); > 838: vm_exit(0); > 839: } else { Is it appropriate to add assert of `CDSConfig::is_dumping_final_static_archive()` in the `else` case? src/hotspot/share/cds/metaspaceShared.cpp line 958: > 956: > 957: if (CDSConfig::is_dumping_preimage_static_archive()) { > 958: log_info(cds)("Reading lambda form invokers of in JDK default classlist ..."); Suggestion: "Reading lambda form invokers from JDK default classlist ...." src/hotspot/share/classfile/systemDictionaryShared.cpp line 995: > 993: > 994: int length = record->num_verifier_constraints(); > 995: if (length > 0 || klass->name()->equals("HelloWorld")) { Is the "HelloWorld" check leftover from debugging? src/hotspot/share/classfile/systemDictionaryShared.cpp line 1031: > 1029: > 1030: int length = rt_info->num_verifier_constraints(); > 1031: if (length > 0 || klass->name()->equals("HelloWorld")) { Is the "HelloWorld" check leftover from debugging? src/hotspot/share/classfile/systemDictionaryShared.cpp line 1164: > 1162: JavaThread* current = JavaThread::current(); > 1163: if (klass->is_shared_platform_class() || klass->is_shared_app_class()) { > 1164: DumpTimeClassInfo* dt_info = get_info(klass); `dt_info` seems unused. test/hotspot/jtreg/runtime/cds/appcds/aotClassLinking/AOTLoaderConstraintsTest.java line 80: > 78: public void checkExecution(OutputAnalyzer out, RunMode runMode) throws Exception { > 79: switch (runMode) { > 80: case RunMode.ASSEMBLY: // JEP 485 + binary AOTConfiguration -- should load AppClass from preimage s/485/483 test/hotspot/jtreg/runtime/cds/appcds/aotClassLinking/AOTLoaderConstraintsTest.java line 101: > 99: // AppClass is loaded by the app loader. To make sure that you cannot use > 100: // type masquerade attacks, we need to add a loader constraint that says: > 101: // app and loo loaders must resolve the symbol "java/lang/String" to the same type. Suggestion: // app and _boot_ loaders ... test/lib/jdk/test/lib/cds/CDSAppTester.java line 365: > 363: } > 364: > 365: // See JEP 485 s/485/483 ------------- PR Review: https://git.openjdk.org/jdk/pull/23484#pullrequestreview-2625174804 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960672938 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960673660 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960675062 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960684356 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960676088 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960677946 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960679370 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960680914 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960681554 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960682805 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960686353 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960687379 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960685390 From dchuyko at openjdk.org Tue Feb 18 22:47:09 2025 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Tue, 18 Feb 2025 22:47:09 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 Message-ID: The location for rfp should be set in in the register map. In particular, it wasn't set in frame::sender_for_interpreter_frame() if neither C2 nor JVMCI were included. COMPILER1_OR_COMPILER2 condition is used instead of COMPILER2_OR_JVMCI, which also covers INCLUDE_JVMCI case. ------------- Commit messages: - Perform update_map_with_saved_link for C1 as well Changes: https://git.openjdk.org/jdk/pull/23682/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23682&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350258 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23682/head:pull/23682 PR: https://git.openjdk.org/jdk/pull/23682 From coleenp at openjdk.org Tue Feb 18 23:49:52 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 18 Feb 2025 23:49:52 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native Message-ID: Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. Tested with tier1-4 and performance tests. ------------- Commit messages: - Add ')' removed from jvmci test. - Shrink modifiers flag so isPrimitive can share word. - Remove isPrimitive intrinsic in favor of a boolean. - Make isInterface non-native. - Make isArray non-native Changes: https://git.openjdk.org/jdk/pull/23572/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23572&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349860 Stats: 178 lines in 19 files changed: 37 ins; 115 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/23572.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23572/head:pull/23572 PR: https://git.openjdk.org/jdk/pull/23572 From liach at openjdk.org Tue Feb 18 23:49:54 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 18 Feb 2025 23:49:54 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: <6EpQLprXKfUDUQ6UIl0Vo0M5OPmCJ4SjcnOeprbO40w=.7d6cd0d3-ec59-4935-adb9-484764f0235c@github.com> On Tue, 11 Feb 2025 20:56:39 GMT, Coleen Phillimore wrote: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. We often need to determine what primitive type a `class` is. Currently we do it through `Wrapper.forPrimitiveType`. Do you see potential value in encoding the primitive status in a byte, so primitive info also knows what primitive type this class is instead of doing identity comparisons? @cl4es Can you offer some insight here? src/java.base/share/classes/jdk/internal/reflect/Reflection.java line 59: > 57: Reflection.class, ALL_MEMBERS, > 58: AccessibleObject.class, ALL_MEMBERS, > 59: Class.class, Set.of("classLoader", "classData", "modifiers", "isPrimitive"), I think the field is named `isPrimitive`, right? test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaType.java line 933: > 931: if (f.getDeclaringClass().equals(metaAccess.lookupJavaType(Class.class))) { > 932: String name = f.getName(); > 933: return name.equals("classLoader") || name.equals("classData") || name.equals("modifiers") || name.equals("isPrimitive"); Same field name remark. test/jdk/jdk/internal/reflect/Reflection/Filtering.java line 59: > 57: { Class.class, "classData" }, > 58: { Class.class, "modifiers" }, > 59: { Class.class, "isPrimitive" }, Same field name remark. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23572#issuecomment-2654120983 PR Comment: https://git.openjdk.org/jdk/pull/23572#issuecomment-2659605250 PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1951773863 PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1951774073 PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1951774214 From coleenp at openjdk.org Tue Feb 18 23:49:54 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 18 Feb 2025 23:49:54 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 20:56:39 GMT, Coleen Phillimore wrote: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. I had a look at Wrapper.forPrimitiveType() and it's not an intrinsic so I don't really know how hot it is. It's a comparison, vs getting a field out of Class. Not sure how to measure it. So I can't address it in this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23572#issuecomment-2659396480 From redestad at openjdk.org Tue Feb 18 23:49:54 2025 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 18 Feb 2025 23:49:54 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 20:56:39 GMT, Coleen Phillimore wrote: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. Touching `Wrapper` seems out of scope for this PR, but if `Class.isPrimitive` gets cheaper from this then `Wrapper.forPrimitiveType` should definitely be examined in a follow-up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23572#issuecomment-2661970849 From coleenp at openjdk.org Tue Feb 18 23:49:55 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 18 Feb 2025 23:49:55 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: <6EpQLprXKfUDUQ6UIl0Vo0M5OPmCJ4SjcnOeprbO40w=.7d6cd0d3-ec59-4935-adb9-484764f0235c@github.com> References: <6EpQLprXKfUDUQ6UIl0Vo0M5OPmCJ4SjcnOeprbO40w=.7d6cd0d3-ec59-4935-adb9-484764f0235c@github.com> Message-ID: On Wed, 12 Feb 2025 00:05:13 GMT, Chen Liang wrote: >> Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. >> Tested with tier1-4 and performance tests. > > src/java.base/share/classes/jdk/internal/reflect/Reflection.java line 59: > >> 57: Reflection.class, ALL_MEMBERS, >> 58: AccessibleObject.class, ALL_MEMBERS, >> 59: Class.class, Set.of("classLoader", "classData", "modifiers", "isPrimitive"), > > I think the field is named `isPrimitive`, right? The method is isPrimitive so I think I had to give the field isPrimitiveType as a name, so this is wrong. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1952521536 From manc at openjdk.org Wed Feb 19 00:09:54 2025 From: manc at openjdk.org (Man Cao) Date: Wed, 19 Feb 2025 00:09:54 GMT Subject: RFR: 8349620: Add VMProps for static JDK [v2] In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 19:26:09 GMT, Jiangli Zhou wrote: >> Please review this change that adds the `jdk.static` VMProps. It can be used to skip tests not for running on static JDK. >> >> This also adds a new WhiteBox native method, `jdk.test.whitebox.WhiteBox.isStatic()`, which is used by VMProps to determine if it's static at runtime. >> >> `@requires !jdk.static` is added in `test/hotspot/jtreg/runtime/modules/ModulesSymLink.java` to skip running the test on static JDK. This test uses `bin/jlink`, which is not provided on static JDK. There are other tests that require tools in `bin/`. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. > > Jiangli Zhou has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8349620 > - - Add 'jdk.static' in VMProps. It can be used to skip tests not for running on static JDK. > - Add WhiteBox isStatic() native method. It's used by VMProps to determine of it's static at runtime. > - Add in '@requires !jdk.static' in test/hotspot/jtreg/runtime/modules/ModulesSymLink.java to skip the test on static JDK since it requires bin/jlink. Changes look good. > I'm also wondering if we would want to merge the isStatic into isHermetic check in the future. I guess it is unlikely we will package each jtreg test into single, hermetic files, each containing the whole JDK? If so, we probably won't need `isHermetic` for jtreg tests. ------------- Marked as reviewed by manc (Committer). PR Review: https://git.openjdk.org/jdk/pull/23528#pullrequestreview-2625325934 From jiangli at openjdk.org Wed Feb 19 00:31:55 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 19 Feb 2025 00:31:55 GMT Subject: RFR: 8349620: Add VMProps for static JDK [v2] In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 00:06:52 GMT, Man Cao wrote: >> Jiangli Zhou has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8349620 >> - - Add 'jdk.static' in VMProps. It can be used to skip tests not for running on static JDK. >> - Add WhiteBox isStatic() native method. It's used by VMProps to determine of it's static at runtime. >> - Add in '@requires !jdk.static' in test/hotspot/jtreg/runtime/modules/ModulesSymLink.java to skip the test on static JDK since it requires bin/jlink. > > Changes look good. > >> I'm also wondering if we would want to merge the isStatic into isHermetic check in the future. > > I guess it is unlikely we will package each jtreg test into single, hermetic files, each containing the whole JDK? If so, we probably won't need `isHermetic` for jtreg tests. @caoman Thanks for reviewing! > > I'm also wondering if we would want to merge the isStatic into isHermetic check in the future. > > I guess it is unlikely we will package each jtreg test into single, hermetic files, each containing the whole JDK? If so, we probably won't need `isHermetic` for jtreg tests. When hermetic is supported, I think we will want to add new tests to explicitly exercise packaging a single executable image including the runtime and run using the single image. Those would be dedicated tests, instead of testing all/most jtreg tests in hermetic mode. `isHermetic` could be useful when we add such tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2667226529 From dlong at openjdk.org Wed Feb 19 00:37:14 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 19 Feb 2025 00:37:14 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v3] In-Reply-To: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: > When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. > > In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. > > Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. Dean Long has updated the pull request incrementally with one additional commit since the last revision: Stricter assertion on ppc64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23557/files - new: https://git.openjdk.org/jdk/pull/23557/files/a7a0ed7a..ebf10dae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23557&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23557&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23557.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23557/head:pull/23557 PR: https://git.openjdk.org/jdk/pull/23557 From dlong at openjdk.org Wed Feb 19 00:39:54 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 19 Feb 2025 00:39:54 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v2] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Mon, 17 Feb 2025 11:27:17 GMT, Richard Reingruber wrote: >>> I think you can make the assertion a little stricter like this [reinrich at 9c3c8a3](https://github.com/reinrich/jdk/commit/9c3c8a33a29b9ae6c4c703992b306dc0cbbcd2f0). >> >> Regarding this stricter version, why are you using is_bottom_frame instead of is_top_frame? The deoptimization code seems to name the most recent leaf frame "top". That sounds like what frame::top_ijava_frame_abi_size is for too. > >> > I think you can make the assertion a little stricter like this [reinrich at 9c3c8a3](https://github.com/reinrich/jdk/commit/9c3c8a33a29b9ae6c4c703992b306dc0cbbcd2f0). >> >> Regarding this stricter version, why are you using is_bottom_frame instead of is_top_frame? The deoptimization code seems to name the most recent leaf frame "top". That sounds like what frame::top_ijava_frame_abi_size is for too. > > Correct, the top frame has a frame::top_ijava_frame_abi but the assertion is about the abi section in the current frame's caller and the the bottom frame's caller also has a top_ijava_frame_abi because i2c doesn't modify it. > > Continue reading if you're interested in more details... > > As said the i2c adapter does *not* trimm the caller frame as the interpreter would, > replacing its large `top_ijava_frame_abi` with a smaller > `parent_ijava_frame_abi`. > > > > Example: compiled frame DEOPTEE is replaced with 3 interpreted frames > > Stack before deoptimization > > | | > | Interpreted CALLER | > | of DEOPTEE frame | > | | > +------------------------+ > | | > | top_ijava_frame_abi | > | | > +========================+ > | | > | Compiled | > | DEOPTEE | > | | > +------------------------+ > | java_abi | > +========================+ > > > Stack when assertion is checked > (i.e. after DEOPTEE was replaced by corresponding inter. frames) > > | | > | Interpreted CALLER | > | of DEOPTEE frame | > | | > +------------------------+ > | | > | top_ijava_frame_abi | <- i2c keeps large abi > | | > +========================+ > | | <- bottom frame > | Interpreted Frame 0 | > | corresp. to DEOPTEE | > | | > +------------------------+ > | parent_ijava_frame_abi | > +========================+ > | | > | Interpreted Frame 1 | > | (inlined by DEOPTEE) | > | | > +------------------------+ > | parent_ijava_frame_abi | > +========================+ > | | <- top frame > | Interpreted Frame 2 | > | (inlined by DEOPTEE) | > | | > +------------------------+ > | | > | top_ijava_frame_abi | > | | > +========================+ > > Notes: > (refering to the frame sections rather than the C++ types) > > - top_ijava_frame_abi comp... @reinrich OK, got it! I pushed your change. Could you also comment on if we could use the value of sender_sp here instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2667234850 From dlong at openjdk.org Wed Feb 19 00:49:56 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 19 Feb 2025 00:49:56 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 22:42:18 GMT, Dmitry Chuyko wrote: > The location for rfp should be set in in the register map. In particular, it wasn't set in frame::sender_for_interpreter_frame() if neither C2 nor JVMCI were included. > > COMPILER1_OR_COMPILER2 condition is used instead of COMPILER2_OR_JVMCI, which also covers INCLUDE_JVMCI case. src/hotspot/cpu/aarch64/frame_aarch64.cpp line 512: > 510: #if COMPILER1_OR_COMPILER2 > 511: if (map->update_map()) { > 512: update_map_with_saved_link(map, (intptr_t**) addr_at(link_offset)); Is it correct that this is only needed when PreserveFramePointer is false, and it's harmless to do when PreserveFramePointer is true? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23682#discussion_r1960788519 From dlong at openjdk.org Wed Feb 19 02:38:03 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 19 Feb 2025 02:38:03 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: <6EpQLprXKfUDUQ6UIl0Vo0M5OPmCJ4SjcnOeprbO40w=.7d6cd0d3-ec59-4935-adb9-484764f0235c@github.com> Message-ID: On Wed, 12 Feb 2025 12:05:22 GMT, Coleen Phillimore wrote: >> src/java.base/share/classes/jdk/internal/reflect/Reflection.java line 59: >> >>> 57: Reflection.class, ALL_MEMBERS, >>> 58: AccessibleObject.class, ALL_MEMBERS, >>> 59: Class.class, Set.of("classLoader", "classData", "modifiers", "isPrimitive"), >> >> I think the field is named `isPrimitive`, right? > > The method is isPrimitive so I think I had to give the field isPrimitiveType as a name, so this is wrong. I don't know if we have a style guide that covers this, but I believe the method and field could both be named `isPrimitive`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1960863953 From amitkumar at openjdk.org Wed Feb 19 02:54:51 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 19 Feb 2025 02:54:51 GMT Subject: RFR: 8350182: [s390x] Relativize locals in interpreter frames In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 11:53:03 GMT, Andrew Haley wrote: >>> Should this be a function? >> >> This is similar to [ppc ijava_idx](https://github.com/openjdk/jdk/blob/885be2efa6b1359a7c7ab36882e19a7eaba77fb3/src/hotspot/cpu/ppc/frame_ppc.hpp#L283C1-L285C58) >> >>> Also, names starting with _ aren't common in HotSpot code, except for fields in C++ objects. >> >> I did it for keeping the parity with: >> >> #define _z_ijava_state_neg(_component) \ >> (int) (-frame::z_ijava_state_size + offset_of(frame::z_ijava_state, _component)) >> >> Do you want me to revert it ? > > There is no good reason to use a macro here. If a function can be a function, and this one can, let it be one. However, there is no reason to change anything else. Leave that for a "macros to functions" patch some other day. I could only achieve this implementation: [macro to method](https://github.com/offamitkumar/jdk/commit/a4a3908288c8c0f518ad263062acd504cd6b7d3c), which looks dirty. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23660#discussion_r1960875328 From dlong at openjdk.org Wed Feb 19 02:56:57 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 19 Feb 2025 02:56:57 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 20:56:39 GMT, Coleen Phillimore wrote: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. src/hotspot/share/classfile/javaClasses.inline.hpp line 301: > 299: #ifdef ASSERT > 300: // The heapwalker walks through Classes that have had their Klass pointers removed, so can't assert this. > 301: // assert(is_primitive == java_class->bool_field(_is_primitive_offset), "must match what we told Java"); I don't understand this comment about the heapwalker. It sounds like we could have `is_primitive` set to true incorrectly. If so, what prevents the asserts below from failing? And why not use the value from _is_primitive_offset instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1960876174 From liach at openjdk.org Wed Feb 19 02:56:58 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 19 Feb 2025 02:56:58 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: <6EpQLprXKfUDUQ6UIl0Vo0M5OPmCJ4SjcnOeprbO40w=.7d6cd0d3-ec59-4935-adb9-484764f0235c@github.com> Message-ID: On Wed, 19 Feb 2025 02:35:25 GMT, Dean Long wrote: >> The method is isPrimitive so I think I had to give the field isPrimitiveType as a name, so this is wrong. > > I don't know if we have a style guide that covers this, but I believe the method and field could both be named `isPrimitive`. I would personally name such a boolean field `primitive`, but I don't have a strong preference on the field naming as long as its references in tests and other locations are correct. In addition, I believe this field may soon be widened to carry more hotspot-specific flags (such as hidden, etc.) so the name is bound to change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1960876569 From haosun at openjdk.org Wed Feb 19 02:58:03 2025 From: haosun at openjdk.org (Hao Sun) Date: Wed, 19 Feb 2025 02:58:03 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 18:47:54 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: > > Adding comments + some code reorganization Hi. Here is the test result of our CI. ### copyright year the following files should update the copyright year to 2025. src/hotspot/cpu/aarch64/assembler_aarch64.hpp src/hotspot/cpu/aarch64/stubRoutines_aarch64.hpp src/hotspot/share/runtime/globals.hpp src/java.base/share/classes/sun/security/provider/ML_DSA.java src/java.base/share/classes/sun/security/provider/SHA3Parallel.java test/micro/org/openjdk/bench/java/security/MLDSA.java ### cross-build failure Cross build for riscv64/s390/ppc64 failed. Here shows the error msg for ppc64 === Output from failing command(s) repeated here === * For target support_interim-jmods_support__create_java.base.jmod_exec: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/tmp/jdk-src/src/hotspot/share/asm/codeBuffer.hpp:200), pid=72752, tid=72769 # assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000e85cb03dc620 <= 0x0000e85cb03e8ab4 <= 0x0000e85cb03e8ab0 # # JRE version: OpenJDK Runtime Environment (25.0) (fastdebug build 25-internal-git-1e01c6deec3) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 25-internal-git-1e01c6deec3, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) # Problematic frame: # V [libjvm.so+0x3b391c] Instruction_aarch64::~Instruction_aarch64()+0xbc # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /tmp/ci-scripts/jdk-src/make/ # # An error report file with more information is saved as: # /tmp/jdk-src/make/hs_err_pid72752.log ... (rest of output omitted) * All command lines available in /sysroot/ppc64el/tmp/build-ppc64el/make-support/failure-logs. === End of repeated output === I suppose we should make the similar update at file `src/hotspot/cpu/aarch64/stubDeclarations_aarch64.hpp` to other platforms ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2667389849 From dlong at openjdk.org Wed Feb 19 03:32:53 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 19 Feb 2025 03:32:53 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 20:56:39 GMT, Coleen Phillimore wrote: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. src/java.base/share/classes/java/lang/Class.java line 1287: > 1285: */ > 1286: public Class getComponentType() { > 1287: // Only return for array types. Storage may be reused for Class for instance types. I don't see any changes to componentType related to reuse. So was this comment and the code below already obsolete? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1960897176 From dlong at openjdk.org Wed Feb 19 03:37:52 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 19 Feb 2025 03:37:52 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 20:56:39 GMT, Coleen Phillimore wrote: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. src/hotspot/share/prims/jvm.cpp line 2283: > 2281: // Otherwise it returns its argument value which is the _the_class Klass*. > 2282: // Please, refer to the description in the jvmtiThreadState.hpp. > 2283: Does this "RedefineClasses support" comment still belong here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1960900041 From asmehra at openjdk.org Wed Feb 19 04:13:57 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 19 Feb 2025 04:13:57 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v4] In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 18:31:42 GMT, Ioi Lam wrote: >> Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. >> >> With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. >> >> To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): >> >>> the format of the configuration and cache files is not specified and is subject to change without notice. >> >> **Notes for reviewers:** >> >> - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. >> - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. >> - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. >> - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. >> - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. >> >> **Misc Note** >> - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccb... > > Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: > > - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself > - Improve error message when AOTMode=create has an incompatible classpath I have added some comments mainly to understand the reason behind the changes when it was not clear to me. src/hotspot/share/cds/archiveBuilder.cpp line 510: > 508: } else if (klass->is_objArray_klass()) { > 509: Klass* bottom = ObjArrayKlass::cast(klass)->bottom_klass(); > 510: if (CDSConfig::is_dumping_dynamic_archive() && MetaspaceShared::is_shared_static(bottom)) { Why do we have the check for `CDSConfig::is_dumping_dynamic_archive()` here? src/hotspot/share/cds/archiveUtils.inline.hpp line 70: > 68: // Returns the address of an Array that's allocated in the ArchiveBuilder "buffer" space. > 69: template > 70: Array* ArchiveUtils::archive_ptr_array(GrowableArray* tmp_array) { If I am reading this code correctly it requires that the elements in `tmp_array` have already been archived. Can we add a comment and/or an assert to that effect. src/hotspot/share/cds/cdsConfig.cpp line 550: > 548: > 549: bool CDSConfig::is_dumping_preimage_static_archive() { > 550: return _is_dumping_static_archive && _is_dumping_preimage_static_archive; Is the check for `_is_dumping_static_archive` really needed? src/hotspot/share/cds/cdsConfig.cpp line 705: > 703: bool CDSConfig::is_dumping_aot_linked_classes() { > 704: if (is_dumping_preimage_static_archive()) { > 705: return false; In leyden-premain branch it returns `AOTClassLinking`, but here it is returning false. So we are not doing pre-linking in the preimage, is that right? src/hotspot/share/cds/cppVtables.cpp line 197: > 195: // _orig_cpp_vtptrs[ConstantPool_Kind] == ((intptr_t**)cp)[0] > 196: static intptr_t* _orig_cpp_vtptrs[_num_cloned_vtable_kinds]; // vtptrs set by the C++ compiler > 197: static intptr_t* _archived_cpp_vtptrs[_num_cloned_vtable_kinds]; // vtptrs used in the static archive what is the purpose of `_archived_cpp_vtptrs`? src/hotspot/share/cds/filemap.cpp line 1529: > 1527: // allow processes that have it open continued access to the file. > 1528: remove(_full_path); > 1529: int mode = CDSConfig::is_dumping_preimage_static_archive() ? 0666 : 0444; Why do we need to give different access permission for preimage file compared to other dumping modes? ------------- PR Review: https://git.openjdk.org/jdk/pull/23484#pullrequestreview-2625271355 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960904061 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960739673 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960907271 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960909728 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960734314 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1960916995 From dholmes at openjdk.org Wed Feb 19 05:14:58 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Feb 2025 05:14:58 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 20:56:39 GMT, Coleen Phillimore wrote: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. Just a few passing comments as this is mainly compiler stuff. Does the SA not need any updates in relation to this? src/hotspot/share/classfile/javaClasses.cpp line 1371: > 1369: #endif > 1370: set_modifiers(java_class, JVM_ACC_ABSTRACT | JVM_ACC_FINAL | JVM_ACC_PUBLIC); > 1371: set_is_primitive(java_class); Just wondering what the comments at the start of this method are alluding to now that we do have a field at the Java level. ??? src/hotspot/share/prims/jvm.cpp line 1262: > 1260: JVM_END > 1261: > 1262: JVM_ENTRY(jboolean, JVM_IsArrayClass(JNIEnv *env, jclass cls)) Where are the changes to jvm.h? src/java.base/share/classes/java/lang/Class.java line 1009: > 1007: private transient Object classData; // Set by VM > 1008: private transient Object[] signers; // Read by VM, mutable > 1009: private final transient char modifiers; // Set by the VM Why the change of type here? ------------- PR Review: https://git.openjdk.org/jdk/pull/23572#pullrequestreview-2625638624 PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1960955739 PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1960959718 PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1960960668 From duke at openjdk.org Wed Feb 19 06:21:33 2025 From: duke at openjdk.org (sli-x) Date: Wed, 19 Feb 2025 06:21:33 GMT Subject: RFR: 8340434: Excessive Young GCs Triggered by CodeCache GC Threshold Message-ID: The trigger of _codecache_GC_threshold in CodeCache::gc_on_allocation is the key to this problem. if (used_ratio > threshold) { // After threshold is reached, scale it by free_ratio so that more aggressive // GC is triggered as we approach code cache exhaustion threshold *= free_ratio; } // If code cache has been allocated without any GC at all, let's make sure // it is eventually invoked to avoid trouble. if (allocated_since_last_ratio > threshold) { // In case the GC is concurrent, we make sure only one thread requests the GC. if (Atomic::cmpxchg(&_unloading_threshold_gc_requested, false, true) == false) { log_info(codecache)("Triggering threshold (%.3f%%) GC due to allocating %.3f%% since last unloading (%.3f%% used -> %.3f%% used)", threshold * 100.0, allocated_since_last_ratio * 100.0, last_used_ratio * 100.0, used_ratio * 100.0); Universe::heap()->collect(GCCause::_codecache_GC_threshold); } } Here with the limited codecache size, the free_ratio will get lower and lower (so as the threshold) if no methods can be swept and thus leads to a more and more frequent collection behavior. Since the collection happens in stw, the whole performance of gc will also be degraded. So a simple solution is to delete the scaling logic here. However, I think here lies some problems worth further exploring. There're two options to control a code cache sweeper, StartAggressiveSweepingAt and SweeperThreshold. StartAggressiveSweepingAt is a sweeper triggered for little space in codeCache and does little harm. However, SweeperThreshold, first introduced by [JDK-8244660](https://bugs.openjdk.org/browse/JDK-8244660), was designed for a regular sweep for codecache, when codeCache sweeper and heap collection are actually individual. After [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) and some patches related, the old mechanism of codeCache sweeper is merged into a concurrent heap collection. So the Code cache sweeper heuristics and the unloading behavior will be promised by the concurrent collection. There's no longer any "zombie" methods to be counted. Considering it will introduce lots of useless collection jobs, I think SweeperThreshold should be deleted now. ------------- Commit messages: - remove SweeperThreshold and set it to Obselete Changes: https://git.openjdk.org/jdk/pull/21084/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21084&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340434 Stats: 55 lines in 14 files changed: 1 ins; 53 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21084.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21084/head:pull/21084 PR: https://git.openjdk.org/jdk/pull/21084 From tschatzl at openjdk.org Wed Feb 19 06:21:33 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 19 Feb 2025 06:21:33 GMT Subject: RFR: 8340434: Excessive Young GCs Triggered by CodeCache GC Threshold In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 08:43:50 GMT, sli-x wrote: > Here with the limited codecache size, the free_ratio will get lower and lower (so as the threshold) if no methods can be swept and thus leads to a more and more frequent collection behavior. Since the collection happens in stw, the whole performance of gc will also be degraded. > >So a simple solution is to delete the scaling logic here. However, I think here lies some problems worth further exploring. > >There're two options to control a code cache sweeper, StartAggressiveSweepingAt and SweeperThreshold. StartAggressiveSweepingAt is a sweeper triggered for little space in codeCache and does little harm. However, SweeperThreshold, first introduced by [JDK-8244660](https://bugs.openjdk.org/browse/JDK-8244660), was designed for a regular sweep for codecache, when codeCache sweeper and heap collection are actually individual. After [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) and some patches related, the old mechanism of codeCache sweeper is merged into a concurrent heap collection. So the Code cache sweeper heuristics and the unloading behavior will be promised by the concurrent collection. There's no longer any "zombie" methods to be counted. Considering it will introduce lots of useless collection jobs, I think SweeperThreshold should be deleted now. I think the general concern presented out by the code > // After threshold is reached, scale it by free_ratio so that more aggressive > // GC is triggered as we approach code cache exhaustion is still valid. How this is implemented also makes somewhat sense: changes are the trigger for collections, allow larger changes before trying to clean out the code cache the emptier the code cache is. It tries to limit code cache memory usage by increasingly doing more frequent collections the more occupied the code cache becomes, i.e. some kind of backpressure on code cache usage. Your use case of limiting the code cache size (and setting initial == max) seems to be relatively unusual one to me, and apparently does not fit that model as it seems that you set code cache size close to actual max usage. Removing `SweepingThreshold` would affect the regular case as well in a significant way (allocate until bumping into the `StartAggressiveSweepingAt`) I do not think removing this part of the heuristic isn't good (or desired at all). Maybe an alternative could be only not doing this heuristic part in your case; and even then am not sure that waiting until hitting the `StartAggressiveSweepingAt` threshold is a good idea; it may be too late to avoid disabling the compiler at least temporarily. And even then, as long as the memory usage keeps being larger larger than the threshold, this will result in continuous code cache sweeps (_every time_ _any_ memory is allocated in the code cache). >From the [JDK-8244660](https://bugs.openjdk.org/browse/JDK-8244660) CR: > This is because users with different sized code caches might want different thresholds. (Otherwise there would be no way to control the sweepers intensity). Which means that one could just take that suggestion literally and not only change the initial/max code cache size but also that threshold in your use case. Stepping back a little, this situation very much resembles issues with G1's `InitiatingHeapOccupancyPercent` pre [JDK-8136677](https://bugs.openjdk.org/browse/JDK-8136677) where a one-size-fits-all value also did not work, and many many people tuned `InitiatingHeapOccupancyPercent` manually in the past. Maybe a similar mechanism at least taking actual code cache allocation rate into account ("when will the current watermark will be hit"?) would be preferable to replace both options (note that since I'm not an expert in code cache, there may be other reasons to clean out the code cache than just occupancy threshold)? Thomas ------------- PR Comment: https://git.openjdk.org/jdk/pull/21084#issuecomment-2383475220 From robilad at openjdk.org Wed Feb 19 06:21:33 2025 From: robilad at openjdk.org (Dalibor Topic) Date: Wed, 19 Feb 2025 06:21:33 GMT Subject: RFR: 8340434: Excessive Young GCs Triggered by CodeCache GC Threshold In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 08:43:50 GMT, sli-x wrote: > The trigger of _codecache_GC_threshold in CodeCache::gc_on_allocation is the key to this problem. > > if (used_ratio > threshold) { > // After threshold is reached, scale it by free_ratio so that more aggressive > // GC is triggered as we approach code cache exhaustion > threshold *= free_ratio; > } > // If code cache has been allocated without any GC at all, let's make sure > // it is eventually invoked to avoid trouble. > if (allocated_since_last_ratio > threshold) { > // In case the GC is concurrent, we make sure only one thread requests the GC. > if (Atomic::cmpxchg(&_unloading_threshold_gc_requested, false, true) == false) { > log_info(codecache)("Triggering threshold (%.3f%%) GC due to allocating %.3f%% since last unloading (%.3f%% used -> %.3f%% used)", > threshold * 100.0, allocated_since_last_ratio * 100.0, last_used_ratio * 100.0, used_ratio * 100.0); > Universe::heap()->collect(GCCause::_codecache_GC_threshold); > } > } > > Here with the limited codecache size, the free_ratio will get lower and lower (so as the threshold) if no methods can be swept and thus leads to a more and more frequent collection behavior. Since the collection happens in stw, the whole performance of gc will also be degraded. > > So a simple solution is to delete the scaling logic here. However, I think here lies some problems worth further exploring. > > There're two options to control a code cache sweeper, StartAggressiveSweepingAt and SweeperThreshold. StartAggressiveSweepingAt is a sweeper triggered for little space in codeCache and does little harm. However, SweeperThreshold, first introduced by [JDK-8244660](https://bugs.openjdk.org/browse/JDK-8244660), was designed for a regular sweep for codecache, when codeCache sweeper and heap collection are actually individual. After [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) and some patches related, the old mechanism of codeCache sweeper is merged into a concurrent heap collection. So the Code cache sweeper heuristics and the unloading behavior will be promised by the concurrent collection. There's no longer any "zombie" methods to be counted. Considering it will introduce lots of useless collection jobs, I think SweeperThreshold should be deleted now. Hi, please send an e-mail to dalibor.topic at oracle.com so that I can verify your account in Skara. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21084#issuecomment-2427338142 From dholmes at openjdk.org Wed Feb 19 06:31:53 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Feb 2025 06:31:53 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 18 Feb 2025 09:21:18 GMT, Albert Mingkun Yang wrote: > Then, maybe the logic is easier to read if the "atomic" access is visible directly from that context, instead of hiding it inside in_critical. Therefore, it probably makes more sense to introduce a new API. WDYT? Okay. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2667619218 From stuefe at openjdk.org Wed Feb 19 06:38:54 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 19 Feb 2025 06:38:54 GMT Subject: RFR: 8344009: Improve compiler memory statistics In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 10:07:30 GMT, Roberto Casta?eda Lozano wrote: > > Hi Thomas, this looks very useful, thanks! I will run some Oracle-internal functional and performance testing and come back with the results next week. > > Functional test results (Oracle internal tier1-tier5) look good. > > I measured C2 execution time before and after the changeset using DaCapo 23 and did not find any statistically significant difference, except for a 2-3% regression on the jython benchmark (using large input size). This small regression is IMO acceptable, particularly given that these changes can be seen as an investment to improve compiler resource utilization in the long run. Hi @robcasloz, interesting, I did not expect this. What did you measure? With Compilation statistic vs without, or with old vs new, but both enabled? (best, give me both sets of command line args) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2667627899 From stuefe at openjdk.org Wed Feb 19 06:44:55 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 19 Feb 2025 06:44:55 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v2] In-Reply-To: References: Message-ID: <7qDiB3QXduIdnc6t-oTxHKT9kbUcwqzELGOSvWcwPAU=.21ade736-8c52-4bbb-b535-30540952a369@github.com> On Fri, 14 Feb 2025 09:34:18 GMT, Thomas Stuefe wrote: >> Greetings, >> >> This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. >> >> Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. >> >> I wanted to track that information correctly and display it clearly in a way that is easy to understand. >> >> The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). >> >> The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. >> >> The statistic gives us two new forms of output: >> >> 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: >> >> >> Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: >> Phase Total ra node comp type index reglive regsplit cienv other >> none 1205512 155104 982984 33712 0 0 0 0 0 33712 >> parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 >> optimizer 916584 0 556416 0 0 0 0 0 0 360168 >> escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 >> connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 >> macroEliminate 196448 0 196448 0 0 0 0 0 0 0 >> iterGVN 327440 0 196368 131072 0 0 0 0 0 0 >> incrementalInline 3992816 0 3043704 62... > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > revert unnecessary copyright change > > We also save a copy of the counters to a global table that contains the N most expensive compilations. That table will be printed when one uses jcmd Compiler.memory. We also print it into the hs-err file. > > This is a new tool for me, but I'd appreciate it if there was the equivalent of `PrintNMTStatistics` such that the table produced from the JCmd is also printed on shutdown. > > Edit: print_final_report doesn't support verbosity, that'd be nice to have. Let's do this in a future RFE, if anyone misses it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2667638346 From jpai at openjdk.org Wed Feb 19 06:53:03 2025 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 19 Feb 2025 06:53:03 GMT Subject: RFR: 8349620: Add VMProps for static JDK [v3] In-Reply-To: References: Message-ID: <7Xbnn-2LkNv3Gsj6nFHXdrdvvPO7vXi3K3MWm33E-jw=.8341aa47-99de-4a67-8339-64b46fa7bb36@github.com> On Tue, 18 Feb 2025 19:29:10 GMT, Jiangli Zhou wrote: >> Please review this change that adds the `jdk.static` VMProps. It can be used to skip tests not for running on static JDK. >> >> This also adds a new WhiteBox native method, `jdk.test.whitebox.WhiteBox.isStatic()`, which is used by VMProps to determine if it's static at runtime. >> >> `@requires !jdk.static` is added in `test/hotspot/jtreg/runtime/modules/ModulesSymLink.java` to skip running the test on static JDK. This test uses `bin/jlink`, which is not provided on static JDK. There are other tests that require tools in `bin/`. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. > > Jiangli Zhou has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. Hello Jiangli, the change to introduce a `@requires` property for identifying a static JDK looks OK to me. > @requires !jdk.static is added in test/hotspot/jtreg/runtime/modules/ModulesSymLink.java to skip running the test on static JDK. This test uses bin/jlink, which is not provided on static JDK. There are other tests that require tools in bin/. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. This part however feels odd. Updating this (and other tests in future) to use the `@requires !jdk.static` to identify the presence or absence of a specific tool in the JDK installation doesn't seem right. Perhaps they should instead rely on a tool-specific property (like maybe `@requires jdk.tool.jlink`)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2667657019 From jpai at openjdk.org Wed Feb 19 06:58:02 2025 From: jpai at openjdk.org (Jaikiran Pai) Date: Wed, 19 Feb 2025 06:58:02 GMT Subject: RFR: 8349620: Add VMProps for static JDK [v3] In-Reply-To: References: Message-ID: <38QCGfzFNUhE69hUlz5o4H_74wR0lw4sivYa-jGgHXg=.ec9d2a40-e71d-404e-8b8c-2cf284d5b876@github.com> On Tue, 18 Feb 2025 19:29:10 GMT, Jiangli Zhou wrote: >> Please review this change that adds the `jdk.static` VMProps. It can be used to skip tests not for running on static JDK. >> >> This also adds a new WhiteBox native method, `jdk.test.whitebox.WhiteBox.isStatic()`, which is used by VMProps to determine if it's static at runtime. >> >> `@requires !jdk.static` is added in `test/hotspot/jtreg/runtime/modules/ModulesSymLink.java` to skip running the test on static JDK. This test uses `bin/jlink`, which is not provided on static JDK. There are other tests that require tools in `bin/`. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. > > Jiangli Zhou has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. On a more general note, is it a goal to have the static JDK build run against all these tests that are part of the JDK repo? Would that mean that a lot of these will have to start using `@requires` to accomodate this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2667664308 From dchuyko at openjdk.org Wed Feb 19 07:19:54 2025 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Wed, 19 Feb 2025 07:19:54 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 00:46:58 GMT, Dean Long wrote: >> The location for rfp should be set in in the register map. In particular, it wasn't set in frame::sender_for_interpreter_frame() if neither C2 nor JVMCI were included. >> >> COMPILER1_OR_COMPILER2 condition is used instead of COMPILER2_OR_JVMCI, which also covers INCLUDE_JVMCI case. > > src/hotspot/cpu/aarch64/frame_aarch64.cpp line 512: > >> 510: #if COMPILER1_OR_COMPILER2 >> 511: if (map->update_map()) { >> 512: update_map_with_saved_link(map, (intptr_t**) addr_at(link_offset)); > > Is it correct that this is only needed when PreserveFramePointer is false, and it's harmless to do when PreserveFramePointer is true? Yes. I'd say rfp always has a location, but it only can contain oop if PreserveFramePointer is false. See MacroAssembler::build_frame()/remove_frame() https://github.com/openjdk/jdk/blob/57f4c30fb6be1da57c8fcc742b5c36d842eef397/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L5710 and frame::update_map_with_saved_link() https://github.com/openjdk/jdk/blob/57f4c30fb6be1da57c8fcc742b5c36d842eef397/src/hotspot/cpu/aarch64/frame_aarch64.inline.hpp#L451 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23682#discussion_r1961081199 From epeter at openjdk.org Wed Feb 19 07:19:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Feb 2025 07:19:56 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 19:18:34 GMT, Vladimir Kozlov wrote: >> Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below. >> >> **Background** >> >> With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer. >> >> **Problem** >> >> So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code. >> >> >> MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1); >> MemorySegment nativeUnaligned = nativeAligned.asSlice(1); >> test3(nativeUnaligned); >> >> >> When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not! >> >> static void test3(MemorySegment ms) { >> for (int i = 0; i < RANGE; i++) { >> long adr = i * 4L; >> int v = ms.get(ELEMENT_LAYOUT, adr); >> ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1)); >> } >> } >> >> >> **Solution: Runtime Checks - Predicate and Multiversioning** >> >> Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check. >> >> I came up with 2 options where to place the runtime checks: >> - A new "auto vectorization" Parse Predicate: >> - This only works when predicates are available. >> - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop. >> - Multiversion the loop: >> - Create 2 copies of the loop (fast and slow loops). >> - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take >> - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even ... > > About actual probability value. I was thinking PROB_LIKELY_MAG(3). PROB_LIKELY_MAG(1) will only guarantee that vectorized loop will be first but it could be enough without moving other loop from hot path. Needs testing. @vnkozlov I suggest that I change the probability to something quite low now, just to make sure that the fast-loop is placed nicely. When I do the experiments for aliasing-analysis runtime-checks, then I will be able to benchmark much better for both cases, since it is much easier to create many different cases. At that point, I could still adapt the probabilities to a different constant. Or maybe I can somehow adjust the probabilities in the chain such that they are balanced. Like if there is 1 condition, give it `0.5`, if there are 2 give them each `sqrt(0.5)`, if there are `n` then `pow(0.5, 1/n)`, so that once you multiply them you get `pow(pow(0.5, 1/n),n) = 0.5`. We could also set another "target" probability than `0.5`. The issue is that experimenting now is a little difficult, because I only have the alignment-checks to play with, which are really really rare to fail in the "real world", I think. But aliasing-checks are more likely to fail, so there could be more interesting benchmark results there. Does that sound ok? > Can we profile alignment in Interpreter (and C1)? It would be nice if we could profile alignment or aliasing. Maybe that is possible. But I suppose there are always cases where profiling is not available (Xcomp ?), and we should have reasonable defaults there. We could investigate profiling in a second step, to improve things if we think that is worth it. Profiling these things would also be additional complexity - I'm not convinced yet it is worth it. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2667703955 From alanb at openjdk.org Wed Feb 19 07:35:53 2025 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 19 Feb 2025 07:35:53 GMT Subject: RFR: 8349620: Add VMProps for static JDK [v3] In-Reply-To: <7Xbnn-2LkNv3Gsj6nFHXdrdvvPO7vXi3K3MWm33E-jw=.8341aa47-99de-4a67-8339-64b46fa7bb36@github.com> References: <7Xbnn-2LkNv3Gsj6nFHXdrdvvPO7vXi3K3MWm33E-jw=.8341aa47-99de-4a67-8339-64b46fa7bb36@github.com> Message-ID: On Wed, 19 Feb 2025 06:50:33 GMT, Jaikiran Pai wrote: > This part however feels odd. Updating this (and other tests in future) to use the `@requires !jdk.static` to identify the presence or absence of a specific tool in the JDK installation doesn't seem right. Perhaps they should instead rely on a tool-specific property (like maybe `@requires jdk.tool.jlink`)? The property will be useful to select the tests that can or cannot be selected by jtreg when the JDK under test is static image. Three are a number of tests that depend on layout or specific files in the modular run-time image so they will need to skipped when the JDK is a static image. So nothing to do with whether specific tools are present or not. The specific test updated here is a bit strange because lib/modules should never be a sym link in the first place and motivation for that is probably a different discussion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2667741172 From epeter at openjdk.org Wed Feb 19 07:42:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Feb 2025 07:42:52 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v2] In-Reply-To: References: Message-ID: > Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below. > > **Background** > > With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer. > > **Problem** > > So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code. > > > MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1); > MemorySegment nativeUnaligned = nativeAligned.asSlice(1); > test3(nativeUnaligned); > > > When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not! > > static void test3(MemorySegment ms) { > for (int i = 0; i < RANGE; i++) { > long adr = i * 4L; > int v = ms.get(ELEMENT_LAYOUT, adr); > ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1)); > } > } > > > **Solution: Runtime Checks - Predicate and Multiversioning** > > Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check. > > I came up with 2 options where to place the runtime checks: > - A new "auto vectorization" Parse Predicate: > - This only works when predicates are available. > - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop. > - Multiversion the loop: > - Create 2 copies of the loop (fast and slow loops). > - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take > - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even unaligned `base`s would end up with reasonably fast code. > - We "stall" the `... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 63 commits: - Merge branch 'master' into JDK-8323582-SW-native-alignment - remove multiversion mark if we break the structure - register opaque with igvn - copyright and rm CFG check - IR rules for all cases - 3 test versions - test changed to unaligned ints - stub for slicing - add Verify/AlignVector runs to test - refactor verify - ... and 53 more: https://git.openjdk.org/jdk/compare/9042aa82...a98ffabf ------------- Changes: https://git.openjdk.org/jdk/pull/22016/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22016&range=01 Stats: 1074 lines in 27 files changed: 951 ins; 28 del; 95 mod Patch: https://git.openjdk.org/jdk/pull/22016.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22016/head:pull/22016 PR: https://git.openjdk.org/jdk/pull/22016 From dchuyko at openjdk.org Wed Feb 19 07:47:52 2025 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Wed, 19 Feb 2025 07:47:52 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 22:42:18 GMT, Dmitry Chuyko wrote: > The location for rfp should be set in in the register map. In particular, it wasn't set in frame::sender_for_interpreter_frame() if neither C2 nor JVMCI were included. > > COMPILER1_OR_COMPILER2 condition is used instead of COMPILER2_OR_JVMCI, which also covers INCLUDE_JVMCI case. The original COMPILER2 guard comes from the 'Initial load' commit for x86 https://github.com/openjdk/jdk/blame/bb2c21a0252d12dc9edef3b676a12051caf7643e/hotspot/src/cpu/x86/vm/frame_x86.cpp#L407 In this patch it is just extended to cover C1 for aarch64 (and jmvci is included only if either C1 or C2 is included). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23682#issuecomment-2667797097 From sroy at openjdk.org Wed Feb 19 08:22:33 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Wed, 19 Feb 2025 08:22:33 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v24] In-Reply-To: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: > JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) > > Currently acceleration code for GHASH is missing for PPC64. > > The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: remove not needed variables ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20235/files - new: https://git.openjdk.org/jdk/pull/20235/files/5b94a7a4..b3fe9d6a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=22-23 Stats: 7 lines in 1 file changed: 0 ins; 4 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20235/head:pull/20235 PR: https://git.openjdk.org/jdk/pull/20235 From sroy at openjdk.org Wed Feb 19 08:39:34 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Wed, 19 Feb 2025 08:39:34 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v25] In-Reply-To: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: > JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) > > Currently acceleration code for GHASH is missing for PPC64. > > The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: remove not needed variables ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20235/files - new: https://git.openjdk.org/jdk/pull/20235/files/b3fe9d6a..b37b09da Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=23-24 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20235/head:pull/20235 PR: https://git.openjdk.org/jdk/pull/20235 From haosun at openjdk.org Wed Feb 19 08:55:13 2025 From: haosun at openjdk.org (Hao Sun) Date: Wed, 19 Feb 2025 08:55:13 GMT Subject: RFR: 8350303: ARM32: StubCodeGenerator::verify_stub(StubGenStubId) failed after JDK-8343767 Message-ID: We encountered the following runtime error on ARM32: assert(StubRoutines::stub_to_blob(stub_id) == blob_id()) failed: wrong blob initial for generation of stub atomic_add I suppose it might be a mistake in JDK-8343767. `atomic_add` stub belongs to **initial** stubs, but it is set as **compiler** stub in JDK-8343767. Note that only ARM32 is affected as only ARM32 defines this stub. Tests: cross-build for `arm32, ppc64, riscv64, s390x` passed. Tier1~3 passed on Linux/AArch64 and Linux/x86_64 ------------- Commit messages: - 8350303: ARM32: StubCodeGenerator::verify_stub(StubGenStubId) failed after JDK-8343767 Changes: https://git.openjdk.org/jdk/pull/23687/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23687&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350303 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23687.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23687/head:pull/23687 PR: https://git.openjdk.org/jdk/pull/23687 From rrich at openjdk.org Wed Feb 19 09:25:55 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 19 Feb 2025 09:25:55 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v2] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Mon, 17 Feb 2025 11:27:17 GMT, Richard Reingruber wrote: >>> I think you can make the assertion a little stricter like this [reinrich at 9c3c8a3](https://github.com/reinrich/jdk/commit/9c3c8a33a29b9ae6c4c703992b306dc0cbbcd2f0). >> >> Regarding this stricter version, why are you using is_bottom_frame instead of is_top_frame? The deoptimization code seems to name the most recent leaf frame "top". That sounds like what frame::top_ijava_frame_abi_size is for too. > >> > I think you can make the assertion a little stricter like this [reinrich at 9c3c8a3](https://github.com/reinrich/jdk/commit/9c3c8a33a29b9ae6c4c703992b306dc0cbbcd2f0). >> >> Regarding this stricter version, why are you using is_bottom_frame instead of is_top_frame? The deoptimization code seems to name the most recent leaf frame "top". That sounds like what frame::top_ijava_frame_abi_size is for too. > > Correct, the top frame has a frame::top_ijava_frame_abi but the assertion is about the abi section in the current frame's caller and the the bottom frame's caller also has a top_ijava_frame_abi because i2c doesn't modify it. > > Continue reading if you're interested in more details... > > As said the i2c adapter does *not* trimm the caller frame as the interpreter would, > replacing its large `top_ijava_frame_abi` with a smaller > `parent_ijava_frame_abi`. > > > > Example: compiled frame DEOPTEE is replaced with 3 interpreted frames > > Stack before deoptimization > > | | > | Interpreted CALLER | > | of DEOPTEE frame | > | | > +------------------------+ > | | > | top_ijava_frame_abi | > | | > +========================+ > | | > | Compiled | > | DEOPTEE | > | | > +------------------------+ > | java_abi | > +========================+ > > > Stack when assertion is checked > (i.e. after DEOPTEE was replaced by corresponding inter. frames) > > | | > | Interpreted CALLER | > | of DEOPTEE frame | > | | > +------------------------+ > | | > | top_ijava_frame_abi | <- i2c keeps large abi > | | > +========================+ > | | <- bottom frame > | Interpreted Frame 0 | > | corresp. to DEOPTEE | > | | > +------------------------+ > | parent_ijava_frame_abi | > +========================+ > | | > | Interpreted Frame 1 | > | (inlined by DEOPTEE) | > | | > +------------------------+ > | parent_ijava_frame_abi | > +========================+ > | | <- top frame > | Interpreted Frame 2 | > | (inlined by DEOPTEE) | > | | > +------------------------+ > | | > | top_ijava_frame_abi | > | | > +========================+ > > Notes: > (refering to the frame sections rather than the C++ types) > > - top_ijava_frame_abi comp... > @reinrich OK, got it! I pushed your change. Thanks. > Could you also comment on if we could use the value of sender_sp here instead? You mean for the calculation of `l2` at L135? sender_sp has room for `Method::max_stack()`. Using it would be less strict. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2668028520 From shade at openjdk.org Wed Feb 19 09:46:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Feb 2025 09:46:53 GMT Subject: RFR: 8350303: ARM32: StubCodeGenerator::verify_stub(StubGenStubId) failed after JDK-8343767 In-Reply-To: References: Message-ID: <3C1acl_UeKqTJWT_OVot1CziPxtesnixqG_TmC5HHdM=.ab8f7b47-ba5d-43a0-88fe-f212a28e749e@github.com> On Wed, 19 Feb 2025 08:50:20 GMT, Hao Sun wrote: > We encountered the following runtime error on ARM32: > > > assert(StubRoutines::stub_to_blob(stub_id) == blob_id()) failed: wrong blob initial for generation of stub atomic_add > > > I suppose it might be a mistake in JDK-8343767. `atomic_add` stub belongs to **initial** stubs, but it is set as **compiler** stub in JDK-8343767. > > Note that only ARM32 is affected as only ARM32 defines this stub. > > Tests: cross-build for `arm32, ppc64, riscv64, s390x` passed. Tier1~3 passed on Linux/AArch64 and Linux/x86_64 Looks fine to me, with nits. @adinn should take a look as well. src/hotspot/share/runtime/stubDeclarations.hpp line 556: > 554: do_entry(initial, fence, fence_entry, fence_entry) \ > 555: do_stub(initial, atomic_add) \ > 556: do_entry(initial, atomic_add, atomic_add_entry, atomic_add_entry) \ Indenting for trailing `` is not tidy. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23687#pullrequestreview-2626209535 PR Review Comment: https://git.openjdk.org/jdk/pull/23687#discussion_r1961319811 From haosun at openjdk.org Wed Feb 19 09:52:00 2025 From: haosun at openjdk.org (Hao Sun) Date: Wed, 19 Feb 2025 09:52:00 GMT Subject: RFR: 8350303: ARM32: StubCodeGenerator::verify_stub(StubGenStubId) failed after JDK-8343767 In-Reply-To: <3C1acl_UeKqTJWT_OVot1CziPxtesnixqG_TmC5HHdM=.ab8f7b47-ba5d-43a0-88fe-f212a28e749e@github.com> References: <3C1acl_UeKqTJWT_OVot1CziPxtesnixqG_TmC5HHdM=.ab8f7b47-ba5d-43a0-88fe-f212a28e749e@github.com> Message-ID: On Wed, 19 Feb 2025 09:43:50 GMT, Aleksey Shipilev wrote: >> We encountered the following runtime error on ARM32: >> >> >> assert(StubRoutines::stub_to_blob(stub_id) == blob_id()) failed: wrong blob initial for generation of stub atomic_add >> >> >> I suppose it might be a mistake in JDK-8343767. `atomic_add` stub belongs to **initial** stubs, but it is set as **compiler** stub in JDK-8343767. >> >> Note that only ARM32 is affected as only ARM32 defines this stub. >> >> Tests: cross-build for `arm32, ppc64, riscv64, s390x` passed. Tier1~3 passed on Linux/AArch64 and Linux/x86_64 > > src/hotspot/share/runtime/stubDeclarations.hpp line 556: > >> 554: do_entry(initial, fence, fence_entry, fence_entry) \ >> 555: do_stub(initial, atomic_add) \ >> 556: do_entry(initial, atomic_add, atomic_add_entry, atomic_add_entry) \ > > Indenting for trailing `` is not tidy. yes. will update soon. Thanks for pointing this out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23687#discussion_r1961328494 From haosun at openjdk.org Wed Feb 19 10:04:55 2025 From: haosun at openjdk.org (Hao Sun) Date: Wed, 19 Feb 2025 10:04:55 GMT Subject: RFR: 8350303: ARM32: StubCodeGenerator::verify_stub(StubGenStubId) failed after JDK-8343767 [v2] In-Reply-To: References: Message-ID: > We encountered the following runtime error on ARM32: > > > assert(StubRoutines::stub_to_blob(stub_id) == blob_id()) failed: wrong blob initial for generation of stub atomic_add > > > I suppose it might be a mistake in JDK-8343767. `atomic_add` stub belongs to **initial** stubs, but it is set as **compiler** stub in JDK-8343767. > > Note that only ARM32 is affected as only ARM32 defines this stub. > > Tests: cross-build for `arm32, ppc64, riscv64, s390x` passed. Tier1~3 passed on Linux/AArch64 and Linux/x86_64 Hao Sun has updated the pull request incrementally with one additional commit since the last revision: fix code style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23687/files - new: https://git.openjdk.org/jdk/pull/23687/files/b4ed51d2..ca172ebf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23687&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23687&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23687.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23687/head:pull/23687 PR: https://git.openjdk.org/jdk/pull/23687 From haosun at openjdk.org Wed Feb 19 10:04:56 2025 From: haosun at openjdk.org (Hao Sun) Date: Wed, 19 Feb 2025 10:04:56 GMT Subject: RFR: 8350303: ARM32: StubCodeGenerator::verify_stub(StubGenStubId) failed after JDK-8343767 [v2] In-Reply-To: References: <3C1acl_UeKqTJWT_OVot1CziPxtesnixqG_TmC5HHdM=.ab8f7b47-ba5d-43a0-88fe-f212a28e749e@github.com> Message-ID: On Wed, 19 Feb 2025 09:48:54 GMT, Hao Sun wrote: >> src/hotspot/share/runtime/stubDeclarations.hpp line 556: >> >>> 554: do_entry(initial, fence, fence_entry, fence_entry) \ >>> 555: do_stub(initial, atomic_add) \ >>> 556: do_entry(initial, atomic_add, atomic_add_entry, atomic_add_entry) \ >> >> Indenting for trailing `` is not tidy. > > yes. will update soon. Thanks for pointing this out. updated in the latest commit. thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23687#discussion_r1961351792 From rcastanedalo at openjdk.org Wed Feb 19 09:52:58 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 19 Feb 2025 09:52:58 GMT Subject: RFR: 8344009: Improve compiler memory statistics In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 06:35:38 GMT, Thomas Stuefe wrote: > > > Hi Thomas, this looks very useful, thanks! I will run some Oracle-internal functional and performance testing and come back with the results next week. > > > > > > Functional test results (Oracle internal tier1-tier5) look good. > > I measured C2 execution time before and after the changeset using DaCapo 23 and did not find any statistically significant difference, except for a 2-3% regression on the jython benchmark (using large input size). This small regression is IMO acceptable, particularly given that these changes can be seen as an investment to improve compiler resource utilization in the long run. > > Hi @robcasloz, interesting, I did not expect this. What did you measure? With Compilation statistic vs without, or with old vs new, but both enabled? (best, give me both sets of command line args) I measured and compared C2 speed in bytecodes/s as reported by `-XX:+CITime` (averaged over a number of repetitions). I wanted to test that the feature does not affect C2's execution time when not used, so I simply compared C2 compilation speed for `jdk-25+10` vs. `jdk-25+10` with this changeset applied on top (both release builds) and `-XX:+CITime -Xbatch -XX:-TieredCompilation` on both builds (the last two flags for better stability across benchmark repetitions). I could observe the regression on both linux-x64 and macosx-aarch64 platforms. Let me know if you need more details. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2668094516 From shade at openjdk.org Wed Feb 19 10:15:52 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Feb 2025 10:15:52 GMT Subject: RFR: 8350303: ARM32: StubCodeGenerator::verify_stub(StubGenStubId) failed after JDK-8343767 [v2] In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 10:04:55 GMT, Hao Sun wrote: >> We encountered the following runtime error on ARM32: >> >> >> assert(StubRoutines::stub_to_blob(stub_id) == blob_id()) failed: wrong blob initial for generation of stub atomic_add >> >> >> I suppose it might be a mistake in JDK-8343767. `atomic_add` stub belongs to **initial** stubs, but it is set as **compiler** stub in JDK-8343767. >> >> Note that only ARM32 is affected as only ARM32 defines this stub. >> >> Tests: cross-build for `arm32, ppc64, riscv64, s390x` passed. Tier1~3 passed on Linux/AArch64 and Linux/x86_64 > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > fix code style Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23687#pullrequestreview-2626291093 From adinn at openjdk.org Wed Feb 19 10:44:55 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 19 Feb 2025 10:44:55 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v2] In-Reply-To: References: <7UgNYEuTu6rj7queOgM9xIy-6kQMdACrZiDLtlniMYw=.dff6f18b-1236-43b1-8280-2bce9160f32a@github.com> Message-ID: On Tue, 4 Feb 2025 18:57:28 GMT, Ferenc Rakoczi wrote: >>> @ferakocz I'm afraid you lucked out on getting your change committed before my reorganization of the stub generation code. If you are unsure of how to do the merge so your new stub is declared and generated following the new model (see the doc comments in stubDeclarations.hpp for details) let me know and I'll be happy to help you sort it out. >> >> @adinn I think I managed to figure it out. Please take a look at the PR and let me know if I should have done anything differently. > >> @ferakocz Yes, the stub declaration part of it looks to be correct. >> >> The rest of the patch will need at least two reviewers (@theRealAph? @martinuy? @franferrax) and may take some time to review, given that they will probably need to read up on the maths and algorithms. As an aid for reviewers and maintainers it would be good to insert a comment into the generator file linking the implementations to the relevant maths and algorithm. I found the FIPS-204 spec and the CRYSTALS-Dilithium Algorithm Speci?cations and Supporting Documentation paper, Shi Bai, L?o Ducas et al, 2021 - are they the best ones to look at? > > The Java implementation of ML-DSA is based on the FIPS-204 standard and the intrinsicss' implementations are based on the corresponding Java methods, except that the montMul() calls in them are inlined. The rest of the transformation from Java code to intrinsic code is pretty straightforward, so a reviewer need not necessarily understand the whole mathematics of the ML-DSA algorithms, just that the Java and the corresponding intrinsic code do the same thing. @ferakocz Apologies for the delays in reviewing and the limited feedback up to now. The code clearly does the job well but I think it would be made clearer and easier to maintain by tweaking/extending some of the generator methods and adding more detailed commenting. I am afraid I may take a few days to provide the relevant details because of other commitments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2668251335 From aph at openjdk.org Wed Feb 19 11:04:53 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 19 Feb 2025 11:04:53 GMT Subject: RFR: 8350182: [s390x] Relativize locals in interpreter frames In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 02:52:40 GMT, Amit Kumar wrote: >> There is no good reason to use a macro here. If a function can be a function, and this one can, let it be one. However, there is no reason to change anything else. Leave that for a "macros to functions" patch some other day. > > I could only achieve this implementation: [macro to method](https://github.com/offamitkumar/jdk/commit/a4a3908288c8c0f518ad263062acd504cd6b7d3c), which looks dirty. Fair enough. It's not worth doing that, and what you do here does match PPC. OK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23660#discussion_r1961471462 From bulasevich at openjdk.org Wed Feb 19 11:23:53 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 19 Feb 2025 11:23:53 GMT Subject: RFR: 8350303: ARM32: StubCodeGenerator::verify_stub(StubGenStubId) failed after JDK-8343767 [v2] In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 10:04:55 GMT, Hao Sun wrote: >> We encountered the following runtime error on ARM32: >> >> >> assert(StubRoutines::stub_to_blob(stub_id) == blob_id()) failed: wrong blob initial for generation of stub atomic_add >> >> >> I suppose it might be a mistake in JDK-8343767. `atomic_add` stub belongs to **initial** stubs, but it is set as **compiler** stub in JDK-8343767. >> >> Note that only ARM32 is affected as only ARM32 defines this stub. >> >> Tests: cross-build for `arm32, ppc64, riscv64, s390x` passed. Tier1~3 passed on Linux/AArch64 and Linux/x86_64 > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > fix code style I noticed this issue too. Thanks for fixing that. Yes, atomic_add blob is generated by generate_initial_stubs, hence atomic_add should be included in the STUBGEN_INITIAL_BLOBS_DO group. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23687#issuecomment-2668347135 From roland at openjdk.org Wed Feb 19 12:14:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 19 Feb 2025 12:14:56 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v2] In-Reply-To: References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> Message-ID: <5hd7BMjze01r6SZOvQ_Ogf_XV1UekB_mYQbpR5_Wzms=.a911ee76-094f-477c-8d24-564c4f0c39d3@github.com> On Tue, 18 Feb 2025 17:20:23 GMT, Emanuel Peter wrote: > Right. I suppose code size might be slightly affected. But I only multi-version if we are already going to pre-main-post the loop. And that means that the loop is already copied 3x, and doing 4x is not that noticable I would suspect. Wouldn't usual optimizations be applied to the slow loop as well (pre/main/post, unrolling)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2668476997 From ayang at openjdk.org Wed Feb 19 13:01:55 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 19 Feb 2025 13:01:55 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v4] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 18 Feb 2025 09:20:57 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker All suggestions/comments are addressed. Tier1-8 pass. It's ready for another round of review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2668584881 From epeter at openjdk.org Wed Feb 19 13:08:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Feb 2025 13:08:56 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v2] In-Reply-To: <5hd7BMjze01r6SZOvQ_Ogf_XV1UekB_mYQbpR5_Wzms=.a911ee76-094f-477c-8d24-564c4f0c39d3@github.com> References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> <5hd7BMjze01r6SZOvQ_Ogf_XV1UekB_mYQbpR5_Wzms=.a911ee76-094f-477c-8d24-564c4f0c39d3@github.com> Message-ID: On Wed, 19 Feb 2025 12:12:27 GMT, Roland Westrelin wrote: > > Right. I suppose code size might be slightly affected. But I only multi-version if we are already going to pre-main-post the loop. And that means that the loop is already copied 3x, and doing 4x is not that noticable I would suspect. > > Wouldn't usual optimizations be applied to the slow loop as well (pre/main/post, unrolling)? That is what I'm avoiding by `stalling` the slow-loop ;) I only `un-stall` the slow-loop if a we actually add a check to the multiversion-if, and at that point we do care about the slow-loop. Does that make sense? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2668601537 From roland at openjdk.org Wed Feb 19 13:20:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 19 Feb 2025 13:20:56 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v2] In-Reply-To: References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> <5hd7BMjze01r6SZOvQ_Ogf_XV1UekB_mYQbpR5_Wzms=.a911ee76-094f-477c-8d24-564c4f0c39d3@github.com> Message-ID: On Wed, 19 Feb 2025 13:06:02 GMT, Emanuel Peter wrote: > That is what I'm avoiding by `stalling` the slow-loop ;) I only `un-stall` the slow-loop if a we actually add a check to the multiversion-if, and at that point we do care about the slow-loop. So if the slow loop is kept, it's fully optimized (other than what misaligned accesses prevent)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2668625485 From epeter at openjdk.org Wed Feb 19 13:20:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Feb 2025 13:20:57 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v2] In-Reply-To: References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> <5hd7BMjze01r6SZOvQ_Ogf_XV1UekB_mYQbpR5_Wzms=.a911ee76-094f-477c-8d24-564c4f0c39d3@github.com> Message-ID: On Wed, 19 Feb 2025 13:15:46 GMT, Roland Westrelin wrote: > > That is what I'm avoiding by `stalling` the slow-loop ;) I only `un-stall` the slow-loop if a we actually add a check to the multiversion-if, and at that point we do care about the slow-loop. > > So if the slow loop is kept, it's fully optimized (other than what misaligned accesses prevent)? Exactly. In a sense that would give you similar results as with unswitching, where we also possibly optimize both branches / loops. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2668632094 From roland at openjdk.org Wed Feb 19 13:28:55 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 19 Feb 2025 13:28:55 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v2] In-Reply-To: References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> <5hd7BMjze01r6SZOvQ_Ogf_XV1UekB_mYQbpR5_Wzms=.a911ee76-094f-477c-8d24-564c4f0c39d3@github.com> Message-ID: On Wed, 19 Feb 2025 13:18:18 GMT, Emanuel Peter wrote: > > > That is what I'm avoiding by `stalling` the slow-loop ;) I only `un-stall` the slow-loop if a we actually add a check to the multiversion-if, and at that point we do care about the slow-loop. > > > > > > So if the slow loop is kept, it's fully optimized (other than what misaligned accesses prevent)? > > Exactly. In a sense that would give you similar results as with unswitching, where we also possibly optimize both branches / loops. So the overhead in the final code is 2x: we can expect the fast and slow paths to be about the same size so the section of code for the loop would see its size grow by 2x. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2668653066 From coleenp at openjdk.org Wed Feb 19 13:54:55 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Feb 2025 13:54:55 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: <6EpQLprXKfUDUQ6UIl0Vo0M5OPmCJ4SjcnOeprbO40w=.7d6cd0d3-ec59-4935-adb9-484764f0235c@github.com> Message-ID: On Wed, 19 Feb 2025 02:54:36 GMT, Chen Liang wrote: >> I don't know if we have a style guide that covers this, but I believe the method and field could both be named `isPrimitive`. > > I would personally name such a boolean field `primitive`, but I don't have a strong preference on the field naming as long as its references in tests and other locations are correct. In addition, I believe this field may soon be widened to carry more hotspot-specific flags (such as hidden, etc.) so the name is bound to change. I like 'primitive'. 'hidden' is also a possibility to add to this and give it the same treatment. I didn't do that one here to limit the changes and I haven't seen all the calls to isHidden so would need to find out how to measure the effects of that change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1961722833 From adinn at openjdk.org Wed Feb 19 14:08:53 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 19 Feb 2025 14:08:53 GMT Subject: RFR: 8350303: ARM32: StubCodeGenerator::verify_stub(StubGenStubId) failed after JDK-8343767 [v2] In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 10:04:55 GMT, Hao Sun wrote: >> We encountered the following runtime error on ARM32: >> >> >> assert(StubRoutines::stub_to_blob(stub_id) == blob_id()) failed: wrong blob initial for generation of stub atomic_add >> >> >> I suppose it might be a mistake in JDK-8343767. `atomic_add` stub belongs to **initial** stubs, but it is set as **compiler** stub in JDK-8343767. >> >> Note that only ARM32 is affected as only ARM32 defines this stub. >> >> Tests: cross-build for `arm32, ppc64, riscv64, s390x` passed. Tier1~3 passed on Linux/AArch64 and Linux/x86_64 > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > fix code style Marked as reviewed by adinn (Reviewer). Yes, this fix is needed. atomic_add is only used in linux-arm-specific code and is needed during VM startup. It was moved into the compiler blob by mistake and hsodl really be declared as an initial stub. ------------- PR Review: https://git.openjdk.org/jdk/pull/23687#pullrequestreview-2626918850 PR Comment: https://git.openjdk.org/jdk/pull/23687#issuecomment-2668758216 From tschatzl at openjdk.org Wed Feb 19 15:06:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 19 Feb 2025 15:06:56 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v4] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 18 Feb 2025 09:20:57 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2627110375 From rriggs at openjdk.org Wed Feb 19 15:12:59 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 19 Feb 2025 15:12:59 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 20:56:39 GMT, Coleen Phillimore wrote: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. Is the change to isInterface and isPrimitive performance neutral? As @IntrinsicCandidates, there would be some performance gain. src/hotspot/share/prims/jvm.cpp line 2284: > 2282: // Please, refer to the description in the jvmtiThreadState.hpp. > 2283: > 2284: JVM_ENTRY(jboolean, JVM_IsInterface(JNIEnv *env, jclass cls)) JVM_IsInteface is deleted in Class.c, what purpose is this? ------------- PR Review: https://git.openjdk.org/jdk/pull/23572#pullrequestreview-2627122068 PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1961858757 From mdoerr at openjdk.org Wed Feb 19 15:13:01 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 19 Feb 2025 15:13:01 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v25] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: <3SvY3kHhA0lVftrxTNQx0AwEJLj4U2Ad_0nNBtX4QAE=.e883027d-157e-4a60-b49a-d10382380f56@github.com> On Wed, 19 Feb 2025 08:39:34 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > remove not needed variables src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 659: > 657: __ beq(CR0, L_trigger_assert); > 658: __ b(L_skip_assert); // Skip assertion if 'blocks' is nonzero > 659: __ bind(L_trigger_assert); The 3 lines above and the labels should be removed. `asm_assert_eq` already does that. src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 692: > 690: __ andi(temp1, data, 15); > 691: __ cmpwi(CR0, temp1, 0); > 692: __ beq(CR0, L_aligned_loop); I'd change it to something like `bne(CR0, L_prepare_unaligned_loop)` and move the next lines there. This will also avoid one extra branch. src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 697: > 695: __ lvx(vHigh, temp1, data); > 696: __ b(L_unaligned_loop); > 697: __ bind(L_aligned_loop); I suggest adding an empty line before every `bind` statement to improve readability. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1961849682 PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1961857833 PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1961862843 From rriggs at openjdk.org Wed Feb 19 15:15:57 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 19 Feb 2025 15:15:57 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 20:56:39 GMT, Coleen Phillimore wrote: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. src/java.base/share/classes/java/lang/Class.java line 807: > 805: */ > 806: public boolean isArray() { > 807: return componentType != null; The componentType declaration should have a comment indicating that == null is the sole indication that the class is an interface. Perhaps there should be an assert somewhere validating/cross checking that requirement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1961869286 From epeter at openjdk.org Wed Feb 19 15:25:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Feb 2025 15:25:56 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v2] In-Reply-To: References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> <5hd7BMjze01r6SZOvQ_Ogf_XV1UekB_mYQbpR5_Wzms=.a911ee76-094f-477c-8d24-564c4f0c39d3@github.com> Message-ID: On Wed, 19 Feb 2025 13:26:37 GMT, Roland Westrelin wrote: > So the overhead in the final code is 2x: we can expect the fast and slow paths to be about the same size so the section of code for the loop would see its size grow by 2x. Yes, if you get to the point where you add a multi-version-if condition, i.e. where SuperWord has decided it needs a speculative assumption (here for alignment, later for aliasing), then we get the whole loop 2x. I suppose we could try to make the pre-main-post loop more complicated and just multi-version the main-loop, but that sounds much more complicated. Do you see any better way than having the 2x code size if we need both a slow and fast loop? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2668974247 From liach at openjdk.org Wed Feb 19 15:45:56 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 19 Feb 2025 15:45:56 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: <2sugnK5bK-SWGVluAWw-UNTKKkErTTNYTxCk7t0mOGo=.3734936f-7a10-48ec-8901-01ece733791f@github.com> On Wed, 19 Feb 2025 05:08:36 GMT, David Holmes wrote: >> Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. >> Tested with tier1-4 and performance tests. > > src/java.base/share/classes/java/lang/Class.java line 1009: > >> 1007: private transient Object classData; // Set by VM >> 1008: private transient Object[] signers; // Read by VM, mutable >> 1009: private final transient char modifiers; // Set by the VM > > Why the change of type here? This is to improve the layout so the introduction of a boolean field does not increase the size of a Class object. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1961925828 From kvn at openjdk.org Wed Feb 19 16:08:56 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 19 Feb 2025 16:08:56 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 07:17:30 GMT, Emanuel Peter wrote: > The issue is that experimenting now is a little difficult, because I only have the alignment-checks to play with, which are really really rare to fail in the "real world", I think. But aliasing-checks are more likely to fail, so there could be more interesting benchmark results there. > > Does that sound ok? Yes, it is good plan. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2669094347 From kvn at openjdk.org Wed Feb 19 16:18:57 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 19 Feb 2025 16:18:57 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 07:17:30 GMT, Emanuel Peter wrote: > > Can we profile alignment in Interpreter (and C1)? > > It would be nice if we could profile alignment or aliasing. Maybe that is possible. But I suppose there are always cases where profiling is not available (Xcomp ?), and we should have reasonable defaults there. We could investigate profiling in a second step, to improve things if we think that is worth it. Profiling these things would also be additional complexity - I'm not convinced yet it is worth it. > > What do you think? You should not worry about `-Xcomp` it is testing flag - we can use some default there. I am fine if you think profiling will not bring us much benefits. Note, I am not asking create counters - just a bit to indicate if we had unaligned access to native memory in a method. In such case we may skip predicate and generate multi versions loop during compilation. On other hand, we may have unaligned access only during startup and not later when we compile method. Anyway, it does not affect these changes. I will look on changes more later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2669115673 From epeter at openjdk.org Wed Feb 19 16:18:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 19 Feb 2025 16:18:57 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 16:14:09 GMT, Vladimir Kozlov wrote: > I am fine if you think profiling will not bring us much benefits Yeah, I think it is a good assumption that we will always get aligned and non-aliasing inputs. And if that is not the case, then this is a rare case, and it should be ok to pay the price of recompilation, I think. > I will look on changes more later. Thanks you :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2669122452 From liach at openjdk.org Wed Feb 19 16:21:57 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 19 Feb 2025 16:21:57 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: <-rVJ4riSt_UybCT4tvNKCBxGfrHr-xnGx0DNDZyGgsA=.11b43081-86f2-47db-b52c-5f74b8e27960@github.com> On Wed, 19 Feb 2025 03:30:04 GMT, Dean Long wrote: >> Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. >> Tested with tier1-4 and performance tests. > > src/java.base/share/classes/java/lang/Class.java line 1287: > >> 1285: */ >> 1286: public Class getComponentType() { >> 1287: // Only return for array types. Storage may be reused for Class for instance types. > > I don't see any changes to componentType related to reuse. So was this comment and the code below already obsolete? It was. Before the componentType field was reused for the class initialization monitor int array, and it caused problems with core reflection if a program reflectively accesses this field after a few hundred times. See [JDK-8337622](https://bugs.openjdk.org/browse/JDK-8337622). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1961989175 From liach at openjdk.org Wed Feb 19 16:25:55 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 19 Feb 2025 16:25:55 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 20:56:39 GMT, Coleen Phillimore wrote: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. Re roger's IntrinsicCandidate remark: One behavior that might be affected would be C2's inlining preferences. Some inline-sensitive workloads like FFM API might be affected if some Class attribute access cannot be inlined because the incoming Class object is not constant. See #23460 and #23628. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23572#issuecomment-2669138528 From coleenp at openjdk.org Wed Feb 19 17:16:02 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Feb 2025 17:16:02 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 20:56:39 GMT, Coleen Phillimore wrote: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. Thanks for looking at this change. ------------- PR Review: https://git.openjdk.org/jdk/pull/23572#pullrequestreview-2626906239 From coleenp at openjdk.org Wed Feb 19 17:16:04 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Feb 2025 17:16:04 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 05:01:53 GMT, David Holmes wrote: >> Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. >> Tested with tier1-4 and performance tests. > > src/hotspot/share/classfile/javaClasses.cpp line 1371: > >> 1369: #endif >> 1370: set_modifiers(java_class, JVM_ACC_ABSTRACT | JVM_ACC_FINAL | JVM_ACC_PUBLIC); >> 1371: set_is_primitive(java_class); > > Just wondering what the comments at the start of this method are alluding to now that we do have a field at the Java level. ??? I think this comment is talking about java.lang.Class.klass field is null. Which it still is since there's no Klass pointer for basic types. But no idea what the comment is in ClassFileParser and I don't think introducing a new Klass for primitive types is an improvement. There are comments elsewhere that the klass is null for primitive types, including the call to java_lang_Class::is_primitive(), so this whole comment is only confusing so I'll remove it. Or change it to: // Mirrors for basic types have a null klass field, which makes them special. > src/hotspot/share/prims/jvm.cpp line 1262: > >> 1260: JVM_END >> 1261: >> 1262: JVM_ENTRY(jboolean, JVM_IsArrayClass(JNIEnv *env, jclass cls)) > > Where are the changes to jvm.h? Good catch, I also removed getProtectionDomain. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1961739084 PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1961773882 From coleenp at openjdk.org Wed Feb 19 17:16:05 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Feb 2025 17:16:05 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: <_j9Wkg21aBltyVrbO4wxGFKmmLDy0T-eorRL4epfS4k=.5a453b6b-d673-4cc6-b29f-192fa74e290c@github.com> On Wed, 19 Feb 2025 02:54:05 GMT, Dean Long wrote: >> Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. >> Tested with tier1-4 and performance tests. > > src/hotspot/share/classfile/javaClasses.inline.hpp line 301: > >> 299: #ifdef ASSERT >> 300: // The heapwalker walks through Classes that have had their Klass pointers removed, so can't assert this. >> 301: // assert(is_primitive == java_class->bool_field(_is_primitive_offset), "must match what we told Java"); > > I don't understand this comment about the heapwalker. It sounds like we could have `is_primitive` set to true incorrectly. If so, what prevents the asserts below from failing? And why not use the value from _is_primitive_offset instead? This is a good question. The heapwalker walks through dead mirrors so I can't assert that a null klass field matches our boolean setting but I don't know why this never asserts (can't find any instances in the bug database) but it seems like it could. I'll use the bool field in the mirror in the assert though but not in the return since the caller likely will fetch the klass pointer next. > src/hotspot/share/prims/jvm.cpp line 2283: > >> 2281: // Otherwise it returns its argument value which is the _the_class Klass*. >> 2282: // Please, refer to the description in the jvmtiThreadState.hpp. >> 2283: > > Does this "RedefineClasses support" comment still belong here? I think so. The comment in jvmtiThreadState.hpp has details why this is. We do a mirror switch before verification apparently because of bug 6214132 it says. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1961770573 PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1962059680 From coleenp at openjdk.org Wed Feb 19 17:16:06 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Feb 2025 17:16:06 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: <2sugnK5bK-SWGVluAWw-UNTKKkErTTNYTxCk7t0mOGo=.3734936f-7a10-48ec-8901-01ece733791f@github.com> References: <2sugnK5bK-SWGVluAWw-UNTKKkErTTNYTxCk7t0mOGo=.3734936f-7a10-48ec-8901-01ece733791f@github.com> Message-ID: On Wed, 19 Feb 2025 15:42:54 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/Class.java line 1009: >> >>> 1007: private transient Object classData; // Set by VM >>> 1008: private transient Object[] signers; // Read by VM, mutable >>> 1009: private final transient char modifiers; // Set by the VM >> >> Why the change of type here? > > This is to improve the layout so the introduction of a boolean field does not increase the size of a Class object. I changed modifiers to u2 so that we won't have an alignment gap with the bool isPrimitiveType flag. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1962060783 From coleenp at openjdk.org Wed Feb 19 17:16:07 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Feb 2025 17:16:07 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: <-rVJ4riSt_UybCT4tvNKCBxGfrHr-xnGx0DNDZyGgsA=.11b43081-86f2-47db-b52c-5f74b8e27960@github.com> References: <-rVJ4riSt_UybCT4tvNKCBxGfrHr-xnGx0DNDZyGgsA=.11b43081-86f2-47db-b52c-5f74b8e27960@github.com> Message-ID: On Wed, 19 Feb 2025 16:19:22 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/Class.java line 1287: >> >>> 1285: */ >>> 1286: public Class getComponentType() { >>> 1287: // Only return for array types. Storage may be reused for Class for instance types. >> >> I don't see any changes to componentType related to reuse. So was this comment and the code below already obsolete? > > It was. Before the componentType field was reused for the class initialization monitor int array, and it caused problems with core reflection if a program reflectively accesses this field after a few hundred times. See [JDK-8337622](https://bugs.openjdk.org/browse/JDK-8337622). Yes, this comment is obsolete. We used to share the componentType mirror with an internal 'init-lock' but it caused a bug that was fixed. If it's not an array the componentType is now always null. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1962069719 From galder at openjdk.org Wed Feb 19 17:42:08 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 19 Feb 2025 17:42:08 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 7 Feb 2025 12:39:24 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 44 additional commits since the last revision: > > - Merge branch 'master' into topic.intrinsify-max-min-long > - Fix typo > - Renaming methods and variables and add docu on algorithms > - Fix copyright years > - Make sure it runs with cpus with either avx512 or asimd > - Test can only run with 256 bit registers or bigger > > * Remove platform dependant check > and use platform independent configuration instead. > - Fix license header > - Tests should also run on aarch64 asimd=true envs > - Added comment around the assertions > - Adjust min/max identity IR test expectations after changes > - ... and 34 more: https://git.openjdk.org/jdk/compare/75abfbc2...a190ae68 Following our discussion, I've run `MinMaxVector.long` benchmarks with superword disabled and with/without `_maxL` intrinsic in both AVX-512 and AVX2 modes. The first thing I've observed is that lacking superword, the results with AVX-512 or AVX2 are identical, so I will just focus on AVX-512 results below. Benchmark (probability) (range) (seed) (size) Mode Cnt -maxL +maxLr Units MinMaxVector.longClippingRange N/A 90 0 1000 thrpt 4 1012.017 1011.8109 ops/ms MinMaxVector.longClippingRange N/A 100 0 1000 thrpt 4 1012.113 1011.9530 ops/ms MinMaxVector.longLoopMax 50 N/A N/A 2048 thrpt 4 463.946 473.9408 ops/ms MinMaxVector.longLoopMax 80 N/A N/A 2048 thrpt 4 465.391 473.8063 ops/ms MinMaxVector.longLoopMax 100 N/A N/A 2048 thrpt 4 510.992 471.6280 ops/ms (-8%) MinMaxVector.longLoopMin 50 N/A N/A 2048 thrpt 4 496.036 495.3142 ops/ms MinMaxVector.longLoopMin 80 N/A N/A 2048 thrpt 4 495.797 497.1214 ops/ms MinMaxVector.longLoopMin 100 N/A N/A 2048 thrpt 4 495.302 495.1535 ops/ms MinMaxVector.longReductionMultiplyMax 50 N/A N/A 2048 thrpt 4 405.495 405.3936 ops/ms MinMaxVector.longReductionMultiplyMax 80 N/A N/A 2048 thrpt 4 405.342 405.4505 ops/ms MinMaxVector.longReductionMultiplyMax 100 N/A N/A 2048 thrpt 4 846.492 405.4779 ops/ms (-52%) MinMaxVector.longReductionMultiplyMin 50 N/A N/A 2048 thrpt 4 414.755 414.7036 ops/ms MinMaxVector.longReductionMultiplyMin 80 N/A N/A 2048 thrpt 4 414.705 414.7093 ops/ms MinMaxVector.longReductionMultiplyMin 100 N/A N/A 2048 thrpt 4 414.761 414.7150 ops/ms MinMaxVector.longReductionSimpleMax 50 N/A N/A 2048 thrpt 4 460.435 460.3764 ops/ms MinMaxVector.longReductionSimpleMax 80 N/A N/A 2048 thrpt 4 460.438 460.4718 ops/ms MinMaxVector.longReductionSimpleMax 100 N/A N/A 2048 thrpt 4 1023.005 460.5417 ops/ms (-55%) MinMaxVector.longReductionSimpleMin 50 N/A N/A 2048 thrpt 4 459.184 459.1662 ops/ms MinMaxVector.longReductionSimpleMin 80 N/A N/A 2048 thrpt 4 459.265 459.2588 ops/ms MinMaxVector.longReductionSimpleMin 100 N/A N/A 2048 thrpt 4 459.263 459.1304 ops/ms `longLoopMax at 100%`, `longReductionMultiplyMax at 100%` and `longReductionSimpleMax at 100%` are regressions with the `_maxL` intrinsic. The cause is familiar: without the intrinsic cmp+mov are emitted, while with the intrinsic and conditions above, `cmov` is emitted: # `longLoopMax` @ 100% -maxL: 4.18% ???? ??? ? 0x00007fb7580f84b2: cmpq %r13, %r11 ????? ??? ? 0x00007fb7580f84b5: jl 0x7fb7580f84ec ;*lreturn {reexecute=0 rethrow=0 return_oop=0} ????? ??? ? ; - java.lang.Math::max at 11 (line 2038) ????? ??? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longLoopMax at 27 (line 256) ????? ??? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub at 19 (line 124) 4.23% ????? ???? ? 0x00007fb7580f84bb: movq %r11, 0x10(%rbp, %rsi, 8);*lastore {reexecute=0 rethrow=0 return_oop=0} ????? ???? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longLoopMax at 30 (line 256) ????? ???? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub at 19 (line 124) +maxL: 1.06% ??? 0x00007fe1b40f5ed1: movq 0x20(%rbx, %r10, 8), %r14;*laload {reexecute=0 rethrow=0 return_oop=0} ??? ; - org.openjdk.bench.java.lang.MinMaxVector::longLoopMax at 26 (line 256) ??? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub at 19 (line 124) 1.34% ??? 0x00007fe1b40f5ed6: cmpq %r14, %r9 2.78% ??? 0x00007fe1b40f5ed9: cmovlq %r14, %r9 2.58% ??? 0x00007fe1b40f5edd: movq %r9, 0x20(%rax, %r10, 8);*lastore {reexecute=0 rethrow=0 return_oop=0} ??? ; - org.openjdk.bench.java.lang.MinMaxVector::longLoopMax at 30 (line 256) ??? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longLoopMax_jmhTest::longLoopMax_thrpt_jmhStub at 19 (line 124) # `longReductionMultiplyMax` @ 100% -maxL: 6.71% ?? ??? 0x00007f8af40f6278: imulq $0xb, 0x18(%r14, %r8, 8), %rdx ?? ??? ;*lmul {reexecute=0 rethrow=0 return_oop=0} ?? ??? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMultiplyMax at 24 (line 285) ?? ??? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMultiplyMax_jmhTest::longReductionMultiplyMax_thrpt_jmhStub at 19 (line 124) 5.28% ?? ??? 0x00007f8af40f627e: nop 10.23% ?? ??? 0x00007f8af40f6280: cmpq %rdx, %rdi ??? ??? 0x00007f8af40f6283: jge 0x7f8af40f62a7 ;*lreturn {reexecute=0 rethrow=0 return_oop=0} ??? ??? ; - java.lang.Math::max at 11 (line 2038) ??? ??? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMultiplyMax at 30 (line 286) ??? ??? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMultiplyMax_jmhTest::longReductionMultiplyMax_thrpt_jmhStub at 19 (line 124) +maxL: 11.07% ?? 0x00007f47000f5c4d: imulq $0xb, 0x18(%r14, %r11, 8), %rax ?? ;*lmul {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMultiplyMax at 24 (line 285) ?? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMultiplyMax_jmhTest::longReductionMultiplyMax_thrpt_jmhStub at 19 (line 124) 0.07% ?? 0x00007f47000f5c53: cmpq %rdx, %rax 11.87% ?? 0x00007f47000f5c56: cmovlq %rdx, %rax ;*invokestatic max {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMultiplyMax at 30 (line 286) ?? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMultiplyMax_jmhTest::longReductionMultiplyMax_thrpt_jmhStub at 19 (line 124) # `longReductionSimpleMax` @ 100% -maxL: 5.71% ????? ???? ? 0x00007fc2380f75f9: movq 0x20(%r14, %r8, 8), %rdi;*laload {reexecute=0 rethrow=0 return_oop=0} ????? ???? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionSimpleMax at 20 (line 295) ????? ???? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionSimpleMax_jmhTest::longReductionSimpleMax_thrpt_jmhStub at 19 (line 124) 1.85% ????? ???? ? 0x00007fc2380f75fe: nop 4.52% ????? ???? ? 0x00007fc2380f7600: cmpq %rdi, %rdx ?????? ???? ? 0x00007fc2380f7603: jge 0x7fc2380f7667 ;*lreturn {reexecute=0 rethrow=0 return_oop=0} ?????? ???? ? ; - java.lang.Math::max at 11 (line 2038) ?????? ???? ? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionSimpleMax at 26 (line 296) ?????? ???? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionSimpleMax_jmhTest::longReductionSimpleMax_thrpt_jmhStub at 19 (line 124) +maxL: 3.06% ?????? 0x00007fa6d00f6020: movq 0x70(%r14, %r11, 8), %r8;*laload {reexecute=0 rethrow=0 return_oop=0} ?????? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionSimpleMax at 20 (line 295) ?????? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionSimpleMax_jmhTest::longReductionSimpleMax_thrpt_jmhStub at 19 (line 124) ?????? 0x00007fa6d00f6025: cmpq %r8, %r13 2.88% ?????? 0x00007fa6d00f6028: cmovlq %r8, %r13 ;*invokestatic max {reexecute=0 rethrow=0 return_oop=0} ?????? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionSimpleMax at 26 (line 296) ?????? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionSimpleMax_jmhTest::longReductionSimpleMax_thrpt_jmhStub at 19 (line 124) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2669329851 From galder at openjdk.org Wed Feb 19 17:47:06 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 19 Feb 2025 17:47:06 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 7 Feb 2025 12:39:24 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 44 additional commits since the last revision: > > - Merge branch 'master' into topic.intrinsify-max-min-long > - Fix typo > - Renaming methods and variables and add docu on algorithms > - Fix copyright years > - Make sure it runs with cpus with either avx512 or asimd > - Test can only run with 256 bit registers or bigger > > * Remove platform dependant check > and use platform independent configuration instead. > - Fix license header > - Tests should also run on aarch64 asimd=true envs > - Added comment around the assertions > - Adjust min/max identity IR test expectations after changes > - ... and 34 more: https://git.openjdk.org/jdk/compare/557d790a...a190ae68 I will run a comparison next with the same batch of tests but looking at `int` and see if there are any differences compared with `long` or not. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2669342758 From lmesnik at openjdk.org Wed Feb 19 17:49:57 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 19 Feb 2025 17:49:57 GMT Subject: RFR: 8350151: Support requires property to filter tests incompatible with --enable-preview [v2] In-Reply-To: References: Message-ID: > It might be useful to be able to run testing with --enable-preview for feature development. The tests incompatible with this mode must be filtered out. > > I chose name 'java.enablePreview' , because it is more java property than vm or jdk. And 'enablePreview' to be similar with jtreg tag. > > Tested by running all test suites, and verifying that test is correctly selected. > There are more tests incompatible with --enable-preview, will mark them in the following bug. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: change other test to exclude ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23653/files - new: https://git.openjdk.org/jdk/pull/23653/files/fafdff14..8019bec1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23653&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23653&range=00-01 Stats: 4 lines in 2 files changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23653.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23653/head:pull/23653 PR: https://git.openjdk.org/jdk/pull/23653 From lmesnik at openjdk.org Wed Feb 19 17:49:58 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 19 Feb 2025 17:49:58 GMT Subject: RFR: 8350151: Support requires property to filter tests incompatible with --enable-preview [v2] In-Reply-To: <1iY92LjhRPbtuENrxBQlsCOKx2EHI6leLAfbkorEGzE=.e964726d-cf2c-4715-91fc-c76fc3e6668d@github.com> References: <1iY92LjhRPbtuENrxBQlsCOKx2EHI6leLAfbkorEGzE=.e964726d-cf2c-4715-91fc-c76fc3e6668d@github.com> Message-ID: On Mon, 17 Feb 2025 08:28:05 GMT, Alan Bateman wrote: >> Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: >> >> change other test to exclude > > test/jdk/java/lang/System/SecurityManagerWarnings.java line 28: > >> 26: * @bug 8266459 8268349 8269543 8270380 >> 27: * @summary check various warnings >> 28: * @requires !java.enablePreview > > What is the reason that this test fails with --enable-preview? Ough, this test puzzled me. It fails with --enable-preview. However, I think I need more time to investigate the issue. So I update PR to exclude test that explicitly says that it shouldn't be executed with --enable-preview. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23653#discussion_r1962116805 From jiangli at openjdk.org Wed Feb 19 18:29:03 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 19 Feb 2025 18:29:03 GMT Subject: RFR: 8349620: Add VMProps for static JDK [v3] In-Reply-To: <38QCGfzFNUhE69hUlz5o4H_74wR0lw4sivYa-jGgHXg=.ec9d2a40-e71d-404e-8b8c-2cf284d5b876@github.com> References: <38QCGfzFNUhE69hUlz5o4H_74wR0lw4sivYa-jGgHXg=.ec9d2a40-e71d-404e-8b8c-2cf284d5b876@github.com> Message-ID: On Wed, 19 Feb 2025 06:54:50 GMT, Jaikiran Pai wrote: >> Jiangli Zhou has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. > > On a more general note, is it a goal to have the static JDK build run against all these tests that are part of the JDK repo? Would that mean that a lot of these will have to start using `@requires` to accomodate this? @jaikiran, thanks for taking a look! > Hello Jiangli, the change to introduce a `@requires` property for identifying a static JDK looks OK to me. > > > @requires !jdk.static is added in test/hotspot/jtreg/runtime/modules/ModulesSymLink.java to skip running the test on static JDK. This test uses bin/jlink, which is not provided on static JDK. There are other tests that require tools in bin/. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. > > This part however feels odd. Updating this (and other tests in future) to use the `@requires !jdk.static` to identify the presence or absence of a specific tool in the JDK installation doesn't seem right. Perhaps they should instead rely on a tool-specific property (like maybe `@requires jdk.tool.jlink`)? Here is some additional context. I picked `ModulesSymLink.java` rather randomly from the hotspot tier1 test failures caused by missing `bin/` tools. I included it in the current change for testing the `@requires !jdk.static` property. There are about 30 test failures in hotspot tier1 due to missing `bin/` tools, those tests execute `bin/jlink`, `bin/jcmd`, `bin/jstat`, `bin/javac`, or etc at runtime. `ModulesSymLink.constructTestJDK()` specifically runs the `jlink` tool to create a test JDK during test execution. Since the `static-jdk` binary only provides a `bin/java` (and no other executables in `bin`), tests run any other tools at runtime in `bin/` fail. The current main issue with tools is that they require the shared libraries from JDK, for example `libjli.so`. Using a tool-specific property can be appropriate if we decide to support a specific set of tools for `static-jdk`, e.g. create a fully statically linked executable for supported tools, or including the tools required `. so` shared libraries in `static-jdk` image. Those details need to be discussed and worked out, we can add more fine grained properties when things are clear. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2669442221 From coleenp at openjdk.org Wed Feb 19 18:40:36 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Feb 2025 18:40:36 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v2] In-Reply-To: References: Message-ID: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Code review comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23572/files - new: https://git.openjdk.org/jdk/pull/23572/files/2d9b9ff5..3e731b9f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23572&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23572&range=00-01 Stats: 17 lines in 3 files changed: 3 ins; 10 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23572.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23572/head:pull/23572 PR: https://git.openjdk.org/jdk/pull/23572 From coleenp at openjdk.org Wed Feb 19 18:40:37 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Feb 2025 18:40:37 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v2] In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 15:07:57 GMT, Roger Riggs wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Code review comments. > > src/hotspot/share/prims/jvm.cpp line 2284: > >> 2282: // Please, refer to the description in the jvmtiThreadState.hpp. >> 2283: >> 2284: JVM_ENTRY(jboolean, JVM_IsInterface(JNIEnv *env, jclass cls)) > > JVM_IsInteface is deleted in Class.c, what purpose is this? The old classfile verifier uses JVM_IsInterface. > src/java.base/share/classes/java/lang/Class.java line 807: > >> 805: */ >> 806: public boolean isArray() { >> 807: return componentType != null; > > The componentType declaration should have a comment indicating that == null is the sole indication that the class is an interface. > Perhaps there should be an assert somewhere validating/cross checking that requirement. I added an assert for set_component_mirror() in the vm, but I don't see how to assert it in Java. Is the comment like: // The componentType field's null value is the sole indication that the class is an array, see isArray() ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1962078501 PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1962186820 From coleenp at openjdk.org Wed Feb 19 18:40:37 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Feb 2025 18:40:37 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v2] In-Reply-To: References: <-rVJ4riSt_UybCT4tvNKCBxGfrHr-xnGx0DNDZyGgsA=.11b43081-86f2-47db-b52c-5f74b8e27960@github.com> Message-ID: <3orjlwIP5PIjb_UBpCUiIV7ZM1U_5BJfZws3PCleKhw=.55438aa0-1c98-476f-b1db-56672a1bbe4a@github.com> On Wed, 19 Feb 2025 17:10:09 GMT, Coleen Phillimore wrote: >> It was. Before the componentType field was reused for the class initialization monitor int array, and it caused problems with core reflection if a program reflectively accesses this field after a few hundred times. See [JDK-8337622](https://bugs.openjdk.org/browse/JDK-8337622). > > Yes, this comment is obsolete. We used to share the componentType mirror with an internal 'init-lock' but it caused a bug that was fixed. If it's not an array the componentType is now always null. So for JDK 8 and 21+, the init_lock and componentType are not shared. In JDK 11 and 17, Hotspot shares the fields, but it's not observable with the older implementation of reflection. See https://bugs.openjdk.org/browse/JDK-8337622. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1962189932 From coleenp at openjdk.org Wed Feb 19 18:42:56 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Feb 2025 18:42:56 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 20:56:39 GMT, Coleen Phillimore wrote: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. I ran our standard set of benchmarks on this change with no differences in performance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23572#issuecomment-2669470645 From jiangli at openjdk.org Wed Feb 19 19:20:54 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 19 Feb 2025 19:20:54 GMT Subject: RFR: 8349620: Add VMProps for static JDK [v3] In-Reply-To: <38QCGfzFNUhE69hUlz5o4H_74wR0lw4sivYa-jGgHXg=.ec9d2a40-e71d-404e-8b8c-2cf284d5b876@github.com> References: <38QCGfzFNUhE69hUlz5o4H_74wR0lw4sivYa-jGgHXg=.ec9d2a40-e71d-404e-8b8c-2cf284d5b876@github.com> Message-ID: <4_HIGqls8cQ_WdDbQuZi1vdkW0rrceR0edOzL-FkIhY=.95bd51ab-64ac-49eb-bb38-f0d14453d511@github.com> On Wed, 19 Feb 2025 06:54:50 GMT, Jaikiran Pai wrote: > On a more general note, is it a goal to have the static JDK build run against all these tests that are part of the JDK repo? Would that mean that a lot of these will have to start using `@requires` to accomodate this? Running static JDK against all (most of) jtreg tests looks practical, based on what we have learned so far about static support for JDK. I think running against tier1 tests is a good initial minimum goal for now to help building a more solid base for us to work on hermetic support on top of static JDK. Following are the tier1 results that I ran a couple of weeks back on static JDK. Overall most tests are passing on static JDK. The `116` failures in hotspot tier1 include gtest failure due to missing `--with-gtest` (I need to fix my test setup to get cleaner results). So overall I think the number of affect tests requiring `@requires` is small. (linked from https://bugs.openjdk.org/browse/JDK-8348905?focusedId=14745728&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14745728) ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1 2709 2592 116 1 << >> jtreg:test/jdk:tier1 2454 2368 86 0 << >> jtreg:test/langtools:tier1 4602 4577 25 0 << jtreg:test/jaxp:tier1 0 0 0 0 >> jtreg:test/lib-test:tier1 35 33 2 0 << ============================== TEST FAILURE ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2669547916 From rriggs at openjdk.org Wed Feb 19 19:39:53 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 19 Feb 2025 19:39:53 GMT Subject: RFR: 8350151: Support requires property to filter tests incompatible with --enable-preview [v2] In-Reply-To: References: <1iY92LjhRPbtuENrxBQlsCOKx2EHI6leLAfbkorEGzE=.e964726d-cf2c-4715-91fc-c76fc3e6668d@github.com> Message-ID: On Wed, 19 Feb 2025 17:43:10 GMT, Leonid Mesnik wrote: >> test/jdk/java/lang/System/SecurityManagerWarnings.java line 28: >> >>> 26: * @bug 8266459 8268349 8269543 8270380 >>> 27: * @summary check various warnings >>> 28: * @requires !java.enablePreview >> >> What is the reason that this test fails with --enable-preview? > > Ough, this test puzzled me. It fails with --enable-preview. However, I think I need more time to investigate the issue. So I update PR to exclude test that explicitly says that it shouldn't be executed with --enable-preview. It ran ok for me, once I got the command line flags correct. It ran ok if I added `@enablePreview`. It also ran ok with an explicit @run command: (it does not currently have an @run command). * @run main/othervm --enable-preview SecurityManagerWarnings ``` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23653#discussion_r1962264733 From alanb at openjdk.org Wed Feb 19 19:45:53 2025 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 19 Feb 2025 19:45:53 GMT Subject: RFR: 8350151: Support requires property to filter tests incompatible with --enable-preview [v2] In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 17:49:57 GMT, Leonid Mesnik wrote: >> It might be useful to be able to run testing with --enable-preview for feature development. The tests incompatible with this mode must be filtered out. >> >> I chose name 'java.enablePreview' , because it is more java property than vm or jdk. And 'enablePreview' to be similar with jtreg tag. >> >> Tested by running all test suites, and verifying that test is correctly selected. >> There are more tests incompatible with --enable-preview, will mark them in the following bug. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > change other test to exclude Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23653#pullrequestreview-2627805876 From eastigeevich at openjdk.org Wed Feb 19 19:54:05 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 19 Feb 2025 19:54:05 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Wed, 19 Feb 2025 17:43:54 GMT, Galder Zamarre?o wrote: >> Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 44 additional commits since the last revision: >> >> - Merge branch 'master' into topic.intrinsify-max-min-long >> - Fix typo >> - Renaming methods and variables and add docu on algorithms >> - Fix copyright years >> - Make sure it runs with cpus with either avx512 or asimd >> - Test can only run with 256 bit registers or bigger >> >> * Remove platform dependant check >> and use platform independent configuration instead. >> - Fix license header >> - Tests should also run on aarch64 asimd=true envs >> - Added comment around the assertions >> - Adjust min/max identity IR test expectations after changes >> - ... and 34 more: https://git.openjdk.org/jdk/compare/384bab03...a190ae68 > > I will run a comparison next with the same batch of tests but looking at `int` and see if there are any differences compared with `long` or not. Hi @galderz, Results from Graviton 3(Neoverse-V1). Without the patch: Benchmark (probability) (range) (seed) (size) Mode Cnt Score Error Units MinMaxVector.intClippingRange N/A 90 0 1000 thrpt 8 12565.427 ? 37.538 ops/ms MinMaxVector.intClippingRange N/A 100 0 1000 thrpt 8 12462.072 ? 84.067 ops/ms MinMaxVector.intLoopMax 50 N/A N/A 2048 thrpt 8 5113.090 ? 68.720 ops/ms MinMaxVector.intLoopMax 80 N/A N/A 2048 thrpt 8 5129.857 ? 35.005 ops/ms MinMaxVector.intLoopMax 100 N/A N/A 2048 thrpt 8 5116.081 ? 8.946 ops/ms MinMaxVector.intLoopMin 50 N/A N/A 2048 thrpt 8 6174.544 ? 52.573 ops/ms MinMaxVector.intLoopMin 80 N/A N/A 2048 thrpt 8 6110.884 ? 54.447 ops/ms MinMaxVector.intLoopMin 100 N/A N/A 2048 thrpt 8 6178.661 ? 48.450 ops/ms MinMaxVector.intReductionMax 50 N/A N/A 2048 thrpt 8 5109.270 ? 10.525 ops/ms MinMaxVector.intReductionMax 80 N/A N/A 2048 thrpt 8 5123.426 ? 28.229 ops/ms MinMaxVector.intReductionMax 100 N/A N/A 2048 thrpt 8 5133.799 ? 7.693 ops/ms MinMaxVector.intReductionMin 50 N/A N/A 2048 thrpt 8 5130.209 ? 15.491 ops/ms MinMaxVector.intReductionMin 80 N/A N/A 2048 thrpt 8 5127.823 ? 27.767 ops/ms MinMaxVector.intReductionMin 100 N/A N/A 2048 thrpt 8 5118.217 ? 22.186 ops/ms MinMaxVector.longClippingRange N/A 90 0 1000 thrpt 8 1831.026 ? 15.502 ops/ms MinMaxVector.longClippingRange N/A 100 0 1000 thrpt 8 1827.194 ? 22.076 ops/ms MinMaxVector.longLoopMax 50 N/A N/A 2048 thrpt 8 2643.383 ? 9.830 ops/ms MinMaxVector.longLoopMax 80 N/A N/A 2048 thrpt 8 2640.417 ? 7.797 ops/ms MinMaxVector.longLoopMax 100 N/A N/A 2048 thrpt 8 1244.321 ? 1.001 ops/ms MinMaxVector.longLoopMin 50 N/A N/A 2048 thrpt 8 3239.234 ? 8.813 ops/ms MinMaxVector.longLoopMin 80 N/A N/A 2048 thrpt 8 3252.713 ? 3.446 ops/ms MinMaxVector.longLoopMin 100 N/A N/A 2048 thrpt 8 1204.370 ? 10.537 ops/ms MinMaxVector.longReductionMax 50 N/A N/A 2048 thrpt 8 2536.322 ? 0.127 ops/ms MinMaxVector.longReductionMax 80 N/A N/A 2048 thrpt 8 2536.318 ? 0.277 ops/ms MinMaxVector.longReductionMax 100 N/A N/A 2048 thrpt 8 1395.273 ? 13.862 ops/ms MinMaxVector.longReductionMin 50 N/A N/A 2048 thrpt 8 2536.325 ? 0.146 ops/ms MinMaxVector.longReductionMin 80 N/A N/A 2048 thrpt 8 2536.265 ? 0.272 ops/ms MinMaxVector.longReductionMin 100 N/A N/A 2048 thrpt 8 1389.982 ? 5.345 ops/ms With the patch: Benchmark (probability) (range) (seed) (size) Mode Cnt Score Error Units MinMaxVector.intClippingRange N/A 90 0 1000 thrpt 8 12598.201 ? 52.631 ops/ms MinMaxVector.intClippingRange N/A 100 0 1000 thrpt 8 12555.284 ? 62.472 ops/ms MinMaxVector.intLoopMax 50 N/A N/A 2048 thrpt 8 5079.499 ? 16.392 ops/ms MinMaxVector.intLoopMax 80 N/A N/A 2048 thrpt 8 5100.673 ? 30.376 ops/ms MinMaxVector.intLoopMax 100 N/A N/A 2048 thrpt 8 5082.544 ? 23.540 ops/ms MinMaxVector.intLoopMin 50 N/A N/A 2048 thrpt 8 6137.512 ? 30.198 ops/ms MinMaxVector.intLoopMin 80 N/A N/A 2048 thrpt 8 6136.233 ? 7.726 ops/ms MinMaxVector.intLoopMin 100 N/A N/A 2048 thrpt 8 6142.262 ? 96.510 ops/ms MinMaxVector.intReductionMax 50 N/A N/A 2048 thrpt 8 5116.055 ? 23.270 ops/ms MinMaxVector.intReductionMax 80 N/A N/A 2048 thrpt 8 5111.481 ? 12.236 ops/ms MinMaxVector.intReductionMax 100 N/A N/A 2048 thrpt 8 5106.367 ? 9.035 ops/ms MinMaxVector.intReductionMin 50 N/A N/A 2048 thrpt 8 5115.666 ? 15.539 ops/ms MinMaxVector.intReductionMin 80 N/A N/A 2048 thrpt 8 5133.127 ? 4.918 ops/ms MinMaxVector.intReductionMin 100 N/A N/A 2048 thrpt 8 5120.469 ? 24.355 ops/ms MinMaxVector.longClippingRange N/A 90 0 1000 thrpt 8 5094.259 ? 14.092 ops/ms MinMaxVector.longClippingRange N/A 100 0 1000 thrpt 8 5096.835 ? 16.517 ops/ms MinMaxVector.longLoopMax 50 N/A N/A 2048 thrpt 8 2636.438 ? 18.760 ops/ms MinMaxVector.longLoopMax 80 N/A N/A 2048 thrpt 8 2644.069 ? 3.933 ops/ms MinMaxVector.longLoopMax 100 N/A N/A 2048 thrpt 8 2646.250 ? 2.007 ops/ms MinMaxVector.longLoopMin 50 N/A N/A 2048 thrpt 8 2648.504 ? 18.294 ops/ms MinMaxVector.longLoopMin 80 N/A N/A 2048 thrpt 8 2658.082 ? 3.362 ops/ms MinMaxVector.longLoopMin 100 N/A N/A 2048 thrpt 8 2647.532 ? 5.600 ops/ms MinMaxVector.longReductionMax 50 N/A N/A 2048 thrpt 8 2536.254 ? 0.086 ops/ms MinMaxVector.longReductionMax 80 N/A N/A 2048 thrpt 8 2536.209 ? 0.129 ops/ms MinMaxVector.longReductionMax 100 N/A N/A 2048 thrpt 8 2536.342 ? 0.068 ops/ms MinMaxVector.longReductionMin 50 N/A N/A 2048 thrpt 8 2536.271 ? 0.203 ops/ms MinMaxVector.longReductionMin 80 N/A N/A 2048 thrpt 8 2536.250 ? 0.343 ops/ms MinMaxVector.longReductionMin 100 N/A N/A 2048 thrpt 8 2536.246 ? 0.179 ops/ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2669613497 From coleenp at openjdk.org Wed Feb 19 20:30:34 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Feb 2025 20:30:34 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v3] In-Reply-To: References: Message-ID: <9ZTXNeE806c5EDt4Y6QFMqull0_SobjS7mOQGk2wE5s=.81291418-85a7-4826-9ecf-dcdd050ecaf1@github.com> > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Rename isPrimitiveType field to primitive. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23572/files - new: https://git.openjdk.org/jdk/pull/23572/files/3e731b9f..d08091ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23572&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23572&range=01-02 Stats: 11 lines in 5 files changed: 2 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23572.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23572/head:pull/23572 PR: https://git.openjdk.org/jdk/pull/23572 From lmesnik at openjdk.org Wed Feb 19 20:35:53 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 19 Feb 2025 20:35:53 GMT Subject: RFR: 8350151: Support requires property to filter tests incompatible with --enable-preview [v2] In-Reply-To: References: <1iY92LjhRPbtuENrxBQlsCOKx2EHI6leLAfbkorEGzE=.e964726d-cf2c-4715-91fc-c76fc3e6668d@github.com> Message-ID: On Wed, 19 Feb 2025 19:36:59 GMT, Roger Riggs wrote: >> Ough, this test puzzled me. It fails with --enable-preview. However, I think I need more time to investigate the issue. So I update PR to exclude test that explicitly says that it shouldn't be executed with --enable-preview. > > It ran ok for me, once I got the command line flags correct. > It ran ok if I added `@enablePreview`. > > It also ran ok with an explicit @run command: (it does not currently have an @run command). > > * @run main/othervm --enable-preview SecurityManagerWarnings > ``` For me it fails with ----------System.err:(18/917)---------- stdout: []; stderr: [Error: Unable to initialize main class SecurityManagerWarnings Caused by: java.lang.NoClassDefFoundError: jdk/test/lib/process/OutputAnalyzer ] exitValue = 1 that seems pretty strange, might be test library issue? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23653#discussion_r1962334975 From jiangli at openjdk.org Wed Feb 19 21:08:52 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 19 Feb 2025 21:08:52 GMT Subject: RFR: 8349620: Add VMProps for static JDK [v3] In-Reply-To: References: <7Xbnn-2LkNv3Gsj6nFHXdrdvvPO7vXi3K3MWm33E-jw=.8341aa47-99de-4a67-8339-64b46fa7bb36@github.com> Message-ID: On Wed, 19 Feb 2025 07:33:02 GMT, Alan Bateman wrote: > > This part however feels odd. Updating this (and other tests in future) to use the `@requires !jdk.static` to identify the presence or absence of a specific tool in the JDK installation doesn't seem right. Perhaps they should instead rely on a tool-specific property (like maybe `@requires jdk.tool.jlink`)? > > The property will be useful to select the tests that can or cannot be selected by jtreg when the JDK under test is static image. There are a number of tests that depend on layout or specific files in the modular run-time image so they will need to skipped when the JDK is a static image. So nothing to do with whether specific tools are present or not. The specific test updated here is a bit strange because lib/modules should never be a sym link in the first place and motivation for that is probably a different discussion. The discussion here made me realize that for the specific ModulesSymLink.java, there are multiple layered issues, including: - No `jlink` tool in `static-jdk` when running on static JDK. This is currently observable using the `static-jdk`. - No separate `lib/modules` file (and other JDK resource files) if we build a single hermetic Java image for the test. Those JDK files will be built into the single hermetic executable image for runtime access. It would more practical to develop new tests specifically for hermetic image, and not try to run all existing jtreg tests using hermetic package and filtering using @requires property. +1 on @AlanBateman's comment, the second layer is separate discussion which can involve java.home. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2669758714 From dlong at openjdk.org Wed Feb 19 21:19:58 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 19 Feb 2025 21:19:58 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v3] In-Reply-To: <_j9Wkg21aBltyVrbO4wxGFKmmLDy0T-eorRL4epfS4k=.5a453b6b-d673-4cc6-b29f-192fa74e290c@github.com> References: <_j9Wkg21aBltyVrbO4wxGFKmmLDy0T-eorRL4epfS4k=.5a453b6b-d673-4cc6-b29f-192fa74e290c@github.com> Message-ID: <3qpqR3PC8PFmdgaIoSYA3jDWdl-oon0-AcIzXcI76rY=.38635503-c067-4f6e-a4f1-92c1b6d991d1@github.com> On Wed, 19 Feb 2025 14:19:58 GMT, Coleen Phillimore wrote: > ... but not in the return since the caller likely will fetch the klass pointer next. I notice that too. Callers are using is_primitive() to short-circuit calls to as_Klass(), which means they seem to be aware of this implementation detail when maybe they shouldn't. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1962384926 From dlong at openjdk.org Wed Feb 19 22:10:52 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 19 Feb 2025 22:10:52 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 22:42:18 GMT, Dmitry Chuyko wrote: > The location for rfp should be set in in the register map. In particular, it wasn't set in frame::sender_for_interpreter_frame() if neither C2 nor JVMCI were included. > > COMPILER1_OR_COMPILER2 condition is used instead of COMPILER2_OR_JVMCI, which also covers INCLUDE_JVMCI case. LGTM ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23682#pullrequestreview-2628087307 From ccheung at openjdk.org Wed Feb 19 22:41:17 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 19 Feb 2025 22:41:17 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v3] In-Reply-To: References: Message-ID: > This changset refactors CDS class paths and module paths validation code into a new class `AOTCodeSource` and related class `AOTCodeSourceConfig`. Code has been moved from filemap.[c|h]pp, classLoader.[c|h]pp, and classLoaderExt.[c|h]pp to aotCodeSource.[c|h]pp. CDS dependencies have been removed from `classLoader.cpp`. More refactoring could be done, such as removing `classLoaderExt.cpp`, in a future RFE. > > Passed tiers 1 - 5 testing. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: @ashu-mehra and @dholmes-ora comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23476/files - new: https://git.openjdk.org/jdk/pull/23476/files/01238742..84206edd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23476&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23476&range=01-02 Stats: 56 lines in 6 files changed: 40 ins; 1 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/23476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23476/head:pull/23476 PR: https://git.openjdk.org/jdk/pull/23476 From ccheung at openjdk.org Wed Feb 19 22:41:18 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 19 Feb 2025 22:41:18 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v2] In-Reply-To: <_9GFraN7--YUC8esAB28iHzdRC7eYJok355TpDH7Df8=.30548f08-454f-47ba-83d5-a4feabaee9ff@github.com> References: <_9GFraN7--YUC8esAB28iHzdRC7eYJok355TpDH7Df8=.30548f08-454f-47ba-83d5-a4feabaee9ff@github.com> Message-ID: On Tue, 18 Feb 2025 07:08:07 GMT, David Holmes wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> @iklam and @ashu-mehra comment > > src/hotspot/share/cds/aotCodeSource.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. > > If the code has been moved from other files, the current opinion/consensus is that the first copyright year should be the oldest first year from all the files from which the code was obtained. Since most of the code and logic are from filemap.[c|h]pp, I've updated the copyright year to 2003, 2025. > src/hotspot/share/cds/aotCodeSource.cpp line 106: > >> 104: for (const char* bootcp = Arguments::get_boot_class_path(); *bootcp != '\0'; ++bootcp) { >> 105: if (*bootcp == *os::path_separator()) { >> 106: ++ bootcp; > > Nit (possibly pre-existing) - no space before/after unary operators Fixed. > src/hotspot/share/cds/aotCodeSource.hpp line 125: > >> 123: // during AOTCache creation are the same as when the AOTCache is used during runtime. >> 124: // Non-existent entries are recorded during AOTCache creation. Those non-existent entries >> 125: // must not exist during runtime. > > Does this mean that if Foo.jar is on the classpath but does not in fact exist, then we record it was on the classpath and require it to be on the classpath at runtime, but also to still not exist? Actually, we don't require the non-existent entries to be on the classpath at runtime. The appcds/NonExistClasspath.java test has cases to cover that. So I've updated the comment as follows: // Non-existent entries are recorded during AOTCache creation. Those non-existent entries, // if they are specified at runtime, must not exist. Also fixed a bug so that the behavior is the same as before this refactoring. > src/hotspot/share/cds/aotCodeSource.hpp line 128: > >> 126: // >> 127: // Some details on validation: >> 128: // - the boot classpath could be appended during runtime if there's no app classpath and > > Suggestion: > > // - the boot classpath can be appended to at runtime if there's no app classpath and no Fixed. > src/hotspot/share/cds/aotCodeSource.hpp line 130: > >> 128: // - the boot classpath could be appended during runtime if there's no app classpath and >> 129: // module path specified when an AOTCache is created; >> 130: // - the app classpath could be appended during runtime; > > Suggestion: > > // - the app classpath can be appended to at runtime; Fixed. > src/hotspot/share/cds/aotCodeSource.hpp line 131: > >> 129: // module path specified when an AOTCache is created; >> 130: // - the app classpath could be appended during runtime; >> 131: // - the module path during runtime could be a superset of the one specified during AOTCache creation. > > Suggestion: > > // - the module path at runtime can be a superset of the one specified during AOTCache creation. Fixed. > test/hotspot/jtreg/runtime/cds/appcds/BootClassPathMismatch.java line 243: > >> 241: * No error - bootclasspath can be appended during runtime if no -cp is specified. >> 242: */ >> 243: public void testBootClassPathAppend() throws Exception { > > A refactoring should not be introducing new test cases. Did you refactor and enhance? This is a missing testcase and works the same before and after refactoring. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1962467161 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1962466958 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1962467710 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1962467840 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1962467951 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1962468046 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1962468281 From ccheung at openjdk.org Wed Feb 19 22:41:19 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 19 Feb 2025 22:41:19 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v2] In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 05:21:09 GMT, Ashutosh Mehra wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> @iklam and @ashu-mehra comment > > src/hotspot/share/runtime/threads.cpp line 27: > >> 25: */ >> 26: >> 27: #include "cds/aotCodeSource.hpp" > > Why is this include needed? Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1962466829 From ccheung at openjdk.org Wed Feb 19 22:41:19 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 19 Feb 2025 22:41:19 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v3] In-Reply-To: <_9GFraN7--YUC8esAB28iHzdRC7eYJok355TpDH7Df8=.30548f08-454f-47ba-83d5-a4feabaee9ff@github.com> References: <_9GFraN7--YUC8esAB28iHzdRC7eYJok355TpDH7Df8=.30548f08-454f-47ba-83d5-a4feabaee9ff@github.com> Message-ID: On Tue, 18 Feb 2025 07:22:00 GMT, David Holmes wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> @ashu-mehra and @dholmes-ora comments > > src/hotspot/share/runtime/threads.cpp line 809: > >> 807: vm_exit_during_initialization("ClassLoader::initialize_module_path() failed unexpectedly"); >> 808: } >> 809: #endif > > Not obvious where this functionality is now handled. It is now being handled in `ClassLoaderDataShared::ensure_module_entry_tables_exist()` and `AOTCodeSourceConfig::dumptime_init_helper()`. > test/hotspot/jtreg/runtime/cds/appcds/NonExistClasspath.java line 70: > >> 68: .assertNormalExit(); >> 69: >> 70: // Now make nonExistPath exist. CDS will fail to load. > > Not at all clearr why these test cases have been removed? I've added back the removed test cases to ensure the handling of non-existent entries remains the same after refactoring. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1962468145 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1962468453 From kvn at openjdk.org Wed Feb 19 22:44:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 19 Feb 2025 22:44:51 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 22:42:18 GMT, Dmitry Chuyko wrote: > COMPILER1_OR_COMPILER2 condition is used instead of COMPILER2_OR_JVMCI, which also covers INCLUDE_JVMCI case. Where you got this? Client VM will have only C1 and you guard will pass. ------------- Changes requested by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23682#pullrequestreview-2628137848 From kvn at openjdk.org Wed Feb 19 22:48:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 19 Feb 2025 22:48:53 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 22:42:18 GMT, Dmitry Chuyko wrote: > The location for rfp should be set in in the register map. In particular, it wasn't set in frame::sender_for_interpreter_frame() if neither C2 nor JVMCI were included. > > COMPILER1_OR_COMPILER2 condition is used instead of COMPILER2_OR_JVMCI, which also covers INCLUDE_JVMCI case. Is it from here?: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvm.cpp#L379 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23682#issuecomment-2669931783 From sviswanathan at openjdk.org Wed Feb 19 23:21:07 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 19 Feb 2025 23:21:07 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v18] In-Reply-To: References: Message-ID: <2OIYkOt8CJ-CqnQIK8sgMDtvLxJUyD5r_mKj5QT7_a8=.10b1d382-d9ae-40a1-b895-09086c80dee6@github.com> On Tue, 18 Feb 2025 02:36:13 GMT, Julian Waters wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > Is anyone else getting compile failures after this was integrated? This weirdly seems to only happen on Linux > > * For target hotspot_variant-server_libjvm_objs_mulnode.o: > /home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp: In member function ?virtual const Type* FmaHFNode::Value(PhaseGVN*) const?: > /home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp:1944:37: error: call of overloaded ?make(double)? is ambiguous > 1944 | return TypeH::make(fma(f1, f2, f3)); > | ^ > In file included from /home/runner/work/jdk/jdk/src/hotspot/share/opto/node.hpp:31, > from /home/runner/work/jdk/jdk/src/hotspot/share/opto/addnode.hpp:28, > from /home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp:26: > /home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:544:23: note: candidate: ?static const TypeH* TypeH::make(float)? > 544 | static const TypeH* make(float f); > | ^~~~ > /home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:545:23: note: candidate: ?static const TypeH* TypeH::make(short int)? > 545 | static const TypeH* make(short f); > | ^~~~ @TheShermanTanker I don't see any compile failures on Linux. Both the fastdebug and release build successfully. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22754#issuecomment-2669979058 From dlong at openjdk.org Wed Feb 19 23:59:52 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 19 Feb 2025 23:59:52 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 22:42:18 GMT, Dmitry Chuyko wrote: > The location for rfp should be set in in the register map. In particular, it wasn't set in frame::sender_for_interpreter_frame() if neither C2 nor JVMCI were included. > > COMPILER1_OR_COMPILER2 condition is used instead of COMPILER2_OR_JVMCI, which also covers INCLUDE_JVMCI case. I think @vnkozlov is right. I don't see where COMPILER1_OR_COMPILER2 is true for JVMCI. Should we use COMPILER1 || COMPILER2_OR_JVMCI, or remove the #if and instead guard with !PreserveFramePointer? ------------- Changes requested by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23682#pullrequestreview-2628245608 From iklam at openjdk.org Thu Feb 20 00:15:22 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 20 Feb 2025 00:15:22 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v5] In-Reply-To: References: Message-ID: <7jSZeGLF9OZNBbrdPqbIBsYa8DM-prrzU-7hnL2mIik=.4f80111c-0916-46ef-ae72-cd4f96ae3887@github.com> > Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. > > With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. > > To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): > >> the format of the configuration and cache files is not specified and is subject to change without notice. > > **Notes for reviewers:** > > - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. > - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. > - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. > - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. > - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. > > **Misc Note** > - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccba6dcce4a3) will be integrated separ... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @calvinccheung comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23484/files - new: https://git.openjdk.org/jdk/pull/23484/files/9f78bb90..72a7d1b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23484&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23484&range=03-04 Stats: 16 lines in 7 files changed: 1 ins; 5 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23484/head:pull/23484 PR: https://git.openjdk.org/jdk/pull/23484 From iklam at openjdk.org Thu Feb 20 00:15:22 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 20 Feb 2025 00:15:22 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v5] In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 22:20:17 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @calvinccheung comments > > src/hotspot/share/cds/cdsConfig.cpp line 661: > >> 659: >> 660: bool CDSConfig::is_loading_heap() { >> 661: return ArchiveHeapLoader::is_in_use(); > > Blank line removed by accident? Fixed. > src/hotspot/share/cds/cdsConfig.hpp line 138: > >> 136: static void stop_using_full_module_graph(const char* reason = nullptr) NOT_CDS_JAVA_HEAP_RETURN; >> 137: >> 138: > > Blank line revmoved by accident? There were two empty lines, so I removed one of them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962560201 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962560299 From iklam at openjdk.org Thu Feb 20 00:15:23 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 20 Feb 2025 00:15:23 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v4] In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 22:22:30 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: >> >> - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself >> - Improve error message when AOTMode=create has an incompatible classpath > > src/hotspot/share/cds/filemap.cpp line 1389: > >> 1387: const char* file_type = CDSConfig::type_of_archive_being_loaded(); >> 1388: if (_is_static) { >> 1389: if ((gen_header->_magic == CDS_ARCHIVE_MAGIC) || > > Probably don't need the extra set of parentheses. Since the next line has a more complex condition, I think putting both this line and the next line in parenthesis makes the code easier to read. > src/hotspot/share/cds/finalImageRecipes.hpp line 67: > >> 65: >> 66: public: >> 67: static void serialize(SerializeClosure* soc, bool is_static_archive); > > The only caller is from `MetaspaceShared::serialize(`): > `FinalImageRecipes::serialize(soc, true);` > Wondering if the `is_static_archive` arg is needed? I removed the `is_static_archive` flag. > src/hotspot/share/cds/metaspaceShared.cpp line 819: > >> 817: CDSConfig::DumperThreadMark dumper_thread_mark(THREAD); >> 818: ResourceMark rm(THREAD); >> 819: HandleMark hm(THREAD); > > Why do we need HandleMark? This is not necessary. I removed it. > src/hotspot/share/cds/metaspaceShared.cpp line 839: > >> 837: tty->print_cr("AOTConfiguration recorded: %s", AOTConfiguration); >> 838: vm_exit(0); >> 839: } else { > > Is it appropriate to add assert of `CDSConfig::is_dumping_final_static_archive()` in the `else` case? In the `else` case, we could be dumping either "final" or "classic" static archive. This function is invoked only when dumping static archives, so I think extra asserts aren't necessary here. > src/hotspot/share/cds/metaspaceShared.cpp line 958: > >> 956: >> 957: if (CDSConfig::is_dumping_preimage_static_archive()) { >> 958: log_info(cds)("Reading lambda form invokers of in JDK default classlist ..."); > > Suggestion: > "Reading lambda form invokers from JDK default classlist ...." Fixed. > src/hotspot/share/classfile/systemDictionaryShared.cpp line 995: > >> 993: >> 994: int length = record->num_verifier_constraints(); >> 995: if (length > 0 || klass->name()->equals("HelloWorld")) { > > Is the "HelloWorld" check leftover from debugging? Fixed. > src/hotspot/share/classfile/systemDictionaryShared.cpp line 1031: > >> 1029: >> 1030: int length = rt_info->num_verifier_constraints(); >> 1031: if (length > 0 || klass->name()->equals("HelloWorld")) { > > Is the "HelloWorld" check leftover from debugging? Fixed. > src/hotspot/share/classfile/systemDictionaryShared.cpp line 1164: > >> 1162: JavaThread* current = JavaThread::current(); >> 1163: if (klass->is_shared_platform_class() || klass->is_shared_app_class()) { >> 1164: DumpTimeClassInfo* dt_info = get_info(klass); > > `dt_info` seems unused. Removed. > test/hotspot/jtreg/runtime/cds/appcds/aotClassLinking/AOTLoaderConstraintsTest.java line 80: > >> 78: public void checkExecution(OutputAnalyzer out, RunMode runMode) throws Exception { >> 79: switch (runMode) { >> 80: case RunMode.ASSEMBLY: // JEP 485 + binary AOTConfiguration -- should load AppClass from preimage > > s/485/483 Fixed > test/hotspot/jtreg/runtime/cds/appcds/aotClassLinking/AOTLoaderConstraintsTest.java line 101: > >> 99: // AppClass is loaded by the app loader. To make sure that you cannot use >> 100: // type masquerade attacks, we need to add a loader constraint that says: >> 101: // app and loo loaders must resolve the symbol "java/lang/String" to the same type. > > Suggestion: > > // app and _boot_ loaders ... Fixed. > test/lib/jdk/test/lib/cds/CDSAppTester.java line 365: > >> 363: } >> 364: >> 365: // See JEP 485 > > s/485/483 Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962560521 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962560973 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962560625 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962560688 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962560741 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962560792 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962560857 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962560924 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962561090 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962561140 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962561025 From ccheung at openjdk.org Thu Feb 20 00:18:36 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Thu, 20 Feb 2025 00:18:36 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v4] In-Reply-To: References: Message-ID: > This changset refactors CDS class paths and module paths validation code into a new class `AOTCodeSource` and related class `AOTCodeSourceConfig`. Code has been moved from filemap.[c|h]pp, classLoader.[c|h]pp, and classLoaderExt.[c|h]pp to aotCodeSource.[c|h]pp. CDS dependencies have been removed from `classLoader.cpp`. More refactoring could be done, such as removing `classLoaderExt.cpp`, in a future RFE. > > Passed tiers 1 - 5 testing. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: new test case in BootClassPathMismatch.java is not applicable to dynamic archive ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23476/files - new: https://git.openjdk.org/jdk/pull/23476/files/84206edd..c2039929 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23476&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23476&range=02-03 Stats: 6 lines in 1 file changed: 5 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23476/head:pull/23476 PR: https://git.openjdk.org/jdk/pull/23476 From ccheung at openjdk.org Thu Feb 20 00:23:55 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Thu, 20 Feb 2025 00:23:55 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v2] In-Reply-To: References: <_9GFraN7--YUC8esAB28iHzdRC7eYJok355TpDH7Df8=.30548f08-454f-47ba-83d5-a4feabaee9ff@github.com> Message-ID: On Wed, 19 Feb 2025 22:37:57 GMT, Calvin Cheung wrote: >> test/hotspot/jtreg/runtime/cds/appcds/BootClassPathMismatch.java line 243: >> >>> 241: * No error - bootclasspath can be appended during runtime if no -cp is specified. >>> 242: */ >>> 243: public void testBootClassPathAppend() throws Exception { >> >> A refactoring should not be introducing new test cases. Did you refactor and enhance? > > This is a missing testcase and works the same before and after refactoring. I needed to exclude the new test from running in the dynamic archive mode because there's no -cp specified during dumping. Otherwise, the test fails with`java.lang.RuntimeException: test.dynamic.dump is not supported with an empty classpath while the classlist is not empty`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1962568623 From kvn at openjdk.org Thu Feb 20 00:38:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 20 Feb 2025 00:38:54 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v4] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 00:18:36 GMT, Calvin Cheung wrote: >> This changset refactors CDS class paths and module paths validation code into a new class `AOTCodeSource` and related class `AOTCodeSourceConfig`. Code has been moved from filemap.[c|h]pp, classLoader.[c|h]pp, and classLoaderExt.[c|h]pp to aotCodeSource.[c|h]pp. CDS dependencies have been removed from `classLoader.cpp`. More refactoring could be done, such as removing `classLoaderExt.cpp`, in a future RFE. >> >> Passed tiers 1 - 5 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > new test case in BootClassPathMismatch.java is not applicable to dynamic archive Passing by comment. We touched it on recent Leyden meeting. The name "AOTCodeSource" is very confusing. Especially when we start caching AOT compiled code. Can we rename it to avoid confusion? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23476#issuecomment-2670095758 From cslucas at openjdk.org Thu Feb 20 01:34:56 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 20 Feb 2025 01:34:56 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v3] In-Reply-To: <6_AoWQhldJttOIEOL1T7HSapPzE4Qn2j4WN7E-bI3rM=.2685d3d8-e47c-42a6-845b-b68f50cc568e@github.com> References: <6_AoWQhldJttOIEOL1T7HSapPzE4Qn2j4WN7E-bI3rM=.2685d3d8-e47c-42a6-845b-b68f50cc568e@github.com> Message-ID: On Thu, 23 Jan 2025 05:45:43 GMT, Cesar Soares Lucas wrote: >> In the current Generational Shenandoah implementation, the pointers to the read and write card tables are established at JVM launch time and fixed during the whole of the application execution. Because they are considered constants, they are embedded as such in JIT-compiled code. >> >> The cleaning of dirty cards in the read card table is performed during the `init-mark` pause, and our experiments show that it represents a sizable portion of that phase's duration. This pull request makes the addresses of the read and write card tables dynamic, with the end goal of reducing the duration of the `init-mark` pause by moving the cleaning of the dirty cards in the read card table to the `reset` concurrent phase. >> >> The idea is quite simple. Instead of using distinct read and write card tables for the entire duration of the JVM execution, we alternate which card table serves as the read/write table during each GC cycle. In the `reset` phase we concurrently clean the cards in the the current _read_ table so that when the cycle reaches the next `init-mark` phase we have a version of the card table totally clear. In the next `init-mark` pause we swap the pointers to the base of the read and write tables. When the `init-mark` finishes the mutator threads will operate on the table just cleaned in the `reset` phase; the GC will operate on the table that just turned the new _read_ table. >> >> Most of the changes in the patch account for the fact that the write card table is no longer at a fixed address. >> >> The primary benefit of this change is that it eliminates the need to copy and zero the remembered set during the init-mark Safepoint. A secondary benefit is that it allows us to replace the init-mark Safepoint with an `init-mark` handshake?something we plan to work on after this PR is merged. >> >> Our internal performance testing showed a significant reduction in the duration of `init-mark` pauses and no statistically significant regression due to the dynamic loading of the card table address in JIT-compiled code. >> >> Functional testing was performed on Linux, macOS, Windows running on x64, AArch64, and their respective 32-bit versions. I?d appreciate it if someone with access to RISC-V (@luhenry ?) and PowerPC (@TheRealMDoerr ?) platforms could review and test the changes for those platforms, as I have limited access to running tests on them. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge master > - Addressing PR comments: some refactorings, ppc fix, off-by-one fix. > - Relocation of Card Tables src/hotspot/share/gc/shared/cardTable.hpp line 205: > 203: virtual CardValue* byte_map_base() const { return _byte_map_base; } > 204: > 205: virtual CardValue* byte_map() const { return _byte_map; } @shipilev - can you please confirm that this is the part that you didn't like? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1962702320 From dholmes at openjdk.org Thu Feb 20 02:17:57 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Feb 2025 02:17:57 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v4] In-Reply-To: References: <_9GFraN7--YUC8esAB28iHzdRC7eYJok355TpDH7Df8=.30548f08-454f-47ba-83d5-a4feabaee9ff@github.com> Message-ID: On Wed, 19 Feb 2025 22:37:48 GMT, Calvin Cheung wrote: >> src/hotspot/share/runtime/threads.cpp line 809: >> >>> 807: vm_exit_during_initialization("ClassLoader::initialize_module_path() failed unexpectedly"); >>> 808: } >>> 809: #endif >> >> Not obvious where this functionality is now handled. > > It is now being handled in `ClassLoaderDataShared::ensure_module_entry_tables_exist()` and `AOTCodeSourceConfig::dumptime_init_helper()`. I don't see anything there that does a vm_exit if something has gone wrong. ?? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1962759081 From dholmes at openjdk.org Thu Feb 20 02:52:58 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Feb 2025 02:52:58 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v3] In-Reply-To: <9ZTXNeE806c5EDt4Y6QFMqull0_SobjS7mOQGk2wE5s=.81291418-85a7-4826-9ecf-dcdd050ecaf1@github.com> References: <9ZTXNeE806c5EDt4Y6QFMqull0_SobjS7mOQGk2wE5s=.81291418-85a7-4826-9ecf-dcdd050ecaf1@github.com> Message-ID: On Wed, 19 Feb 2025 20:30:34 GMT, Coleen Phillimore wrote: >> Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. >> Tested with tier1-4 and performance tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Rename isPrimitiveType field to primitive. src/java.base/share/classes/java/lang/Class.java line 1296: > 1294: > 1295: // The componentType field's null value is the sole indication that the class is an array, > 1296: // see isArray(). Suggestion: // The componentType field's null value is the sole indication that the class // is an array - see isArray(). src/java.base/share/classes/java/lang/Class.java line 1297: > 1295: // The componentType field's null value is the sole indication that the class is an array, > 1296: // see isArray(). > 1297: private transient final Class componentType; Why the `transient` and how does this impact serialization?? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1962781718 PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1962782083 From amitkumar at openjdk.org Thu Feb 20 04:13:55 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 20 Feb 2025 04:13:55 GMT Subject: RFR: 8345285: [s390x] test failures: foreign/normalize/TestNormalize.java with C2 [v2] In-Reply-To: <5mzVC2Orm-UK6GVDMIaL94k6ne7ojjE5PBRKIOE7-UQ=.4cc7aa73-e265-4a92-b646-8d17e8c147e3@github.com> References: <5mzVC2Orm-UK6GVDMIaL94k6ne7ojjE5PBRKIOE7-UQ=.4cc7aa73-e265-4a92-b646-8d17e8c147e3@github.com> Message-ID: On Wed, 22 Jan 2025 03:37:10 GMT, Amit Kumar wrote: >> Fixes `foreign/normalize/TestNormalize.java` failure on s390x. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > take ppc route no. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23197#issuecomment-2670417275 From liach at openjdk.org Thu Feb 20 04:31:55 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 20 Feb 2025 04:31:55 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v3] In-Reply-To: References: <9ZTXNeE806c5EDt4Y6QFMqull0_SobjS7mOQGk2wE5s=.81291418-85a7-4826-9ecf-dcdd050ecaf1@github.com> Message-ID: On Thu, 20 Feb 2025 02:50:17 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename isPrimitiveType field to primitive. > > src/java.base/share/classes/java/lang/Class.java line 1297: > >> 1295: // The componentType field's null value is the sole indication that the class is an array, >> 1296: // see isArray(). >> 1297: private transient final Class componentType; > > Why the `transient` and how does this impact serialization?? The fields in `Class` are just inconsistently transient or not. `Class` has special treatment in the serialization specification, so the presence or absence of the `transient` modifier has no effect. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1962841415 From iklam at openjdk.org Thu Feb 20 04:43:00 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 20 Feb 2025 04:43:00 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v4] In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 03:42:12 GMT, Ashutosh Mehra wrote: >> Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: >> >> - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself >> - Improve error message when AOTMode=create has an incompatible classpath > > src/hotspot/share/cds/archiveBuilder.cpp line 510: > >> 508: } else if (klass->is_objArray_klass()) { >> 509: Klass* bottom = ObjArrayKlass::cast(klass)->bottom_klass(); >> 510: if (CDSConfig::is_dumping_dynamic_archive() && MetaspaceShared::is_shared_static(bottom)) { > > Why do we have the check for `CDSConfig::is_dumping_dynamic_archive()` here? Before this PR, `CDSConfig::is_dumping_dynamic_archive()` was the only reason we will see `MetaspaceShared::is_shared_static(bottom)`, and clearly it wasn't excluded (it was included in the static archive). After this PR, `bottom` can come from the mapped shared archive (the preimage). It's logically the same as a regular class we've loaded in a classic static dump, so it should go through the `SystemDictionaryShared::is_excluded_class()` check. > src/hotspot/share/cds/archiveUtils.inline.hpp line 70: > >> 68: // Returns the address of an Array that's allocated in the ArchiveBuilder "buffer" space. >> 69: template >> 70: Array* ArchiveUtils::archive_ptr_array(GrowableArray* tmp_array) { > > If I am reading this code correctly it requires that the elements in `tmp_array` have already been archived. Can we add a comment and/or an assert to that effect. I added comments about what the elements can be. // All pointers in tmp_array must point to: // - a buffered object; or // - a source object that has been archived; or // - (only when dumping dynamic archive) an object in the static archive. If it's not one of these types, it will be caught by the assert inside `builder->get_buffered_addr(ptr)` > src/hotspot/share/cds/cdsConfig.cpp line 550: > >> 548: >> 549: bool CDSConfig::is_dumping_preimage_static_archive() { >> 550: return _is_dumping_static_archive && _is_dumping_preimage_static_archive; > > Is the check for `_is_dumping_static_archive` really needed? I removed it. `_is_dumping_static_archive` is always set to true when `_is_dumping_preimage_static_archive` is set to true. > src/hotspot/share/cds/cdsConfig.cpp line 705: > >> 703: bool CDSConfig::is_dumping_aot_linked_classes() { >> 704: if (is_dumping_preimage_static_archive()) { >> 705: return false; > > In leyden-premain branch it returns `AOTClassLinking`, but here it is returning false. So we are not doing pre-linking in the preimage, is that right? Right, in this PR prelinking is not done in preimage. When we load the preimage to dump the final image, `FinalImageRecipes::load_all_classes()` is responsible for loading all classes. The Leyden repo enables prelinking in the preimage. It relies on AOTLinkedClassBulkLoader to load all the classes when creating the final image. However, this is unsafe because we don't have the archived module graph there. I plan to fix the Leyden code when this PR is merged into Leyden. > src/hotspot/share/cds/filemap.cpp line 1529: > >> 1527: // allow processes that have it open continued access to the file. >> 1528: remove(_full_path); >> 1529: int mode = CDSConfig::is_dumping_preimage_static_archive() ? 0666 : 0444; > > Why do we need to give different access permission for preimage file compared to other dumping modes? It's to keep the current behavior: -XX:AOTConfiguration creates a file with read/write permission but -XX:AOTCache creates a read-only file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962846991 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962846956 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962847058 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962847101 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1962847143 From iklam at openjdk.org Thu Feb 20 04:42:58 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 20 Feb 2025 04:42:58 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v6] In-Reply-To: References: Message-ID: > Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. > > With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. > > To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): > >> the format of the configuration and cache files is not specified and is subject to change without notice. > > **Notes for reviewers:** > > - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. > - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. > - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. > - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. > - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. > > **Misc Note** > - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccba6dcce4a3) will be integrated separ... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @ashu-mehra comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23484/files - new: https://git.openjdk.org/jdk/pull/23484/files/72a7d1b1..39c53cc1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23484&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23484&range=04-05 Stats: 21 lines in 3 files changed: 15 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23484/head:pull/23484 PR: https://git.openjdk.org/jdk/pull/23484 From iklam at openjdk.org Thu Feb 20 05:10:44 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 20 Feb 2025 05:10:44 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v7] In-Reply-To: References: Message-ID: > Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. > > With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. > > To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): > >> the format of the configuration and cache files is not specified and is subject to change without notice. > > **Notes for reviewers:** > > - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. > - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. > - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. > - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. > - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. > > **Misc Note** > - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccba6dcce4a3) will be integrated separ... Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Merge branch 'master' into 8348426-binary-aot-config-file - @ashu-mehra comments - @calvinccheung comments - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself - Improve error message when AOTMode=create has an incompatible classpath - Fixed test cases @vnkozlov - Update "make test JTREG_AOT_JDK=true ..." to work with binary AOT configuration - Fixed test failures - Added comments; fixed FIXMEs - Added more test cases - ... and 2 more: https://git.openjdk.org/jdk/compare/00d4e4a9...21f140e7 ------------- Changes: https://git.openjdk.org/jdk/pull/23484/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23484&range=06 Stats: 1226 lines in 40 files changed: 1014 ins; 46 del; 166 mod Patch: https://git.openjdk.org/jdk/pull/23484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23484/head:pull/23484 PR: https://git.openjdk.org/jdk/pull/23484 From dchuyko at openjdk.org Thu Feb 20 05:55:52 2025 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Thu, 20 Feb 2025 05:55:52 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 22:46:02 GMT, Vladimir Kozlov wrote: > Is it from here?: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvm.cpp#L379 > > Yes, I mean this check. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23682#issuecomment-2670531640 From galder at openjdk.org Thu Feb 20 06:27:57 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 20 Feb 2025 06:27:57 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Wed, 19 Feb 2025 19:50:50 GMT, Evgeny Astigeevich wrote: >> I will run a comparison next with the same batch of tests but looking at `int` and see if there are any differences compared with `long` or not. > > Hi @galderz, > Results from Graviton 3(Neoverse-V1). > Without the patch: > > Benchmark (probability) (range) (seed) (size) Mode Cnt Score Error Units > MinMaxVector.intClippingRange N/A 90 0 1000 thrpt 8 12565.427 ? 37.538 ops/ms > MinMaxVector.intClippingRange N/A 100 0 1000 thrpt 8 12462.072 ? 84.067 ops/ms > MinMaxVector.intLoopMax 50 N/A N/A 2048 thrpt 8 5113.090 ? 68.720 ops/ms > MinMaxVector.intLoopMax 80 N/A N/A 2048 thrpt 8 5129.857 ? 35.005 ops/ms > MinMaxVector.intLoopMax 100 N/A N/A 2048 thrpt 8 5116.081 ? 8.946 ops/ms > MinMaxVector.intLoopMin 50 N/A N/A 2048 thrpt 8 6174.544 ? 52.573 ops/ms > MinMaxVector.intLoopMin 80 N/A N/A 2048 thrpt 8 6110.884 ? 54.447 ops/ms > MinMaxVector.intLoopMin 100 N/A N/A 2048 thrpt 8 6178.661 ? 48.450 ops/ms > MinMaxVector.intReductionMax 50 N/A N/A 2048 thrpt 8 5109.270 ? 10.525 ops/ms > MinMaxVector.intReductionMax 80 N/A N/A 2048 thrpt 8 5123.426 ? 28.229 ops/ms > MinMaxVector.intReductionMax 100 N/A N/A 2048 thrpt 8 5133.799 ? 7.693 ops/ms > MinMaxVector.intReductionMin 50 N/A N/A 2048 thrpt 8 5130.209 ? 15.491 ops/ms > MinMaxVector.intReductionMin 80 N/A N/A 2048 thrpt 8 5127.823 ? 27.767 ops/ms > MinMaxVector.intReductionMin 100 N/A N/A 2048 thrpt 8 5118.217 ? 22.186 ops/ms > MinMaxVector.longClippingRange N/A 90 0 1000 thrpt 8 1831.026 ? 15.502 ops/ms > MinMaxVector.longClippingRange N/A 100 0 1000 thrpt 8 1827.194 ? 22.076 ops/ms > MinMaxVector.longLoopMax 50 N/A N/A 2048 thrpt 8 2643.383 ? 9.830 ops/ms > MinMaxVector.longLoopMax 80 N/A N/A 2048 thrpt 8 2640.417 ? 7.797 ops/ms > MinMaxVector.longLoopMax 100 N/A N/A 2048 thrpt 8 1244.321 ? 1.001 ops/ms > MinMaxVector.longLoopMin 50 N/A N/A 2048 thrpt 8 3239.234 ? 8.813 ops/ms > MinMaxVector.longLoopMin 80 N/A N/A 2048 thrpt 8 3252.713 ? 3... Thanks @eastig for the results on Graviton 3. I'm summarising them here: Benchmark (probability) (range) (seed) (size) Mode Cnt Base Patch Units MinMaxVector.longClippingRange N/A 90 0 1000 thrpt 8 1831.026 5094.259 ops/ms (+178%) MinMaxVector.longClippingRange N/A 100 0 1000 thrpt 8 1827.194 5096.835 ops/ms (+180%) MinMaxVector.longLoopMax 50 N/A N/A 2048 thrpt 8 2643.383 2636.438 ops/ms MinMaxVector.longLoopMax 80 N/A N/A 2048 thrpt 8 2640.417 2644.069 ops/ms MinMaxVector.longLoopMax 100 N/A N/A 2048 thrpt 8 1244.321 2646.250 ops/ms (+112%) MinMaxVector.longLoopMin 50 N/A N/A 2048 thrpt 8 3239.234 2648.504 ops/ms (-18%) MinMaxVector.longLoopMin 80 N/A N/A 2048 thrpt 8 3252.713 2658.082 ops/ms (-18%) MinMaxVector.longLoopMin 100 N/A N/A 2048 thrpt 8 1204.370 2647.532 ops/ms (+119%) MinMaxVector.longReductionMax 50 N/A N/A 2048 thrpt 8 2536.322 2536.254 ops/ms MinMaxVector.longReductionMax 80 N/A N/A 2048 thrpt 8 2536.318 2536.209 ops/ms MinMaxVector.longReductionMax 100 N/A N/A 2048 thrpt 8 1395.273 2536.342 ops/ms (+81%) MinMaxVector.longReductionMin 50 N/A N/A 2048 thrpt 8 2536.325 2536.271 ops/ms MinMaxVector.longReductionMin 80 N/A N/A 2048 thrpt 8 2536.265 2536.250 ops/ms MinMaxVector.longReductionMin 100 N/A N/A 2048 thrpt 8 1389.982 2536.246 ops/ms (+82%) On Graviton 3 there are wide enough registers for vectorization to kick in, so we see similar improvements to x64 AVX-512 in https://github.com/openjdk/jdk/pull/20098#issuecomment-2642788364. There is some variance in the 50/80% probability range, this was also observed slightly there, but on the aarch64 system it looks more pronounced. Interesting that it happened with min but not max but could be variance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2670574593 From galder at openjdk.org Thu Feb 20 06:53:04 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 20 Feb 2025 06:53:04 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 7 Feb 2025 12:39:24 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 44 additional commits since the last revision: > > - Merge branch 'master' into topic.intrinsify-max-min-long > - Fix typo > - Renaming methods and variables and add docu on algorithms > - Fix copyright years > - Make sure it runs with cpus with either avx512 or asimd > - Test can only run with 256 bit registers or bigger > > * Remove platform dependant check > and use platform independent configuration instead. > - Fix license header > - Tests should also run on aarch64 asimd=true envs > - Added comment around the assertions > - Adjust min/max identity IR test expectations after changes > - ... and 34 more: https://git.openjdk.org/jdk/compare/af7645e5...a190ae68 To follow up https://github.com/openjdk/jdk/pull/20098#issuecomment-2669329851, I've run `MinMaxVector.int` benchmarks with **superword disabled** and with/without `_max`/`_min` intrinsics in both AVX-512 and AVX2 modes. # AVX-512 Benchmark (probability) (range) (seed) (size) Mode Cnt -min/-max +min/+max Units MinMaxVector.intClippingRange N/A 90 0 1000 thrpt 4 1067.050 1038.640 ops/ms MinMaxVector.intClippingRange N/A 100 0 1000 thrpt 4 1041.922 1039.004 ops/ms MinMaxVector.intLoopMax 50 N/A N/A 2048 thrpt 4 605.173 604.337 ops/ms MinMaxVector.intLoopMax 80 N/A N/A 2048 thrpt 4 605.106 604.309 ops/ms MinMaxVector.intLoopMax 100 N/A N/A 2048 thrpt 4 604.547 604.432 ops/ms MinMaxVector.intLoopMin 50 N/A N/A 2048 thrpt 4 495.042 605.216 ops/ms (+22%) MinMaxVector.intLoopMin 80 N/A N/A 2048 thrpt 4 495.105 495.217 ops/ms MinMaxVector.intLoopMin 100 N/A N/A 2048 thrpt 4 495.040 495.176 ops/ms MinMaxVector.intReductionMultiplyMax 50 N/A N/A 2048 thrpt 4 407.920 407.984 ops/ms MinMaxVector.intReductionMultiplyMax 80 N/A N/A 2048 thrpt 4 407.710 407.965 ops/ms MinMaxVector.intReductionMultiplyMax 100 N/A N/A 2048 thrpt 4 874.881 407.922 ops/ms (-53%) MinMaxVector.intReductionMultiplyMin 50 N/A N/A 2048 thrpt 4 407.911 407.947 ops/ms MinMaxVector.intReductionMultiplyMin 80 N/A N/A 2048 thrpt 4 408.015 408.024 ops/ms MinMaxVector.intReductionMultiplyMin 100 N/A N/A 2048 thrpt 4 407.978 407.994 ops/ms MinMaxVector.intReductionSimpleMax 50 N/A N/A 2048 thrpt 4 460.538 460.439 ops/ms MinMaxVector.intReductionSimpleMax 80 N/A N/A 2048 thrpt 4 460.579 460.542 ops/ms MinMaxVector.intReductionSimpleMax 100 N/A N/A 2048 thrpt 4 998.211 460.404 ops/ms (-53%) MinMaxVector.intReductionSimpleMin 50 N/A N/A 2048 thrpt 4 460.570 460.447 ops/ms MinMaxVector.intReductionSimpleMin 80 N/A N/A 2048 thrpt 4 460.552 460.493 ops/ms MinMaxVector.intReductionSimpleMin 100 N/A N/A 2048 thrpt 4 460.455 460.485 ops/ms There is some improvement in `intLoopMin` @ 50% but this didn't materialize in the `perfasm` run, so I don't think it can be strictly be correlated with the use/not-use of the intrinsic. `intReductionMultiplyMax` and `intReductionSimpleMax` @ 100% regressions with the `max` intrinsic activated are consistent with what the saw with long. ### `intReductionMultiplyMin` and `intReductionSimpleMin` @ 100% same performance There is something very intriguing happening here, which I don't know it's due to min itself or int vs long. Basically, with or without the `min` intrinsic the performance of these 2 benchmarks is same at 100% branch probability. What is going on? Let's look at one of them: -min # VM options: -Djava.library.path=/home/vagrant/1/jdk-intrinsify-max-min-long/build/release-linux-x86_64/images/test/micro/native -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_max,_min -XX:-UseSuperWord ... 3.04% ???? ? 0x00007f49280f76e9: cmpl %edi, %r10d 3.14% ???? ? 0x00007f49280f76ec: cmovgl %edi, %r10d ;*ireturn {reexecute=0 rethrow=0 return_oop=0} ???? ? ; - java.lang.Math::min at 10 (line 2119) ???? ? ; - org.openjdk.bench.java.lang.MinMaxVector::intReductionSimpleMin at 23 (line 212) ???? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_intReductionSimpleMin_jmhTest::intReductionSimpleMin_thrpt_jmhStub at 19 (line 124) +min # VM options: -Djava.library.path=/home/vagrant/1/jdk-intrinsify-max-min-long/build/release-linux-x86_64/images/test/micro/native -XX:-UseSuperWord ... 3.10% ?? ? 0x00007fbf340f6b97: cmpl %edi, %r10d 3.08% ?? ? 0x00007fbf340f6b9a: cmovgl %edi, %r10d ;*invokestatic min {reexecute=0 rethrow=0 return_oop=0} ?? ? ; - org.openjdk.bench.java.lang.MinMaxVector::intReductionSimpleMin at 23 (line 212) ?? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_intReductionSimpleMin_jmhTest::intReductionSimpleMin_thrpt_jmhStub at 19 (line 124) Both are `cmov`. You can see how without the intrinsic the `Math::min` bytecode gets executed and transformed into a `cmov` and the same with the intrinsic. I will verify this with long shortly to see if this behaviour is specific to `min` operation or something to do with int vs long. # AVX2 Here are the AVX2 numbers: Benchmark (probability) (range) (seed) (size) Mode Cnt -min/-max +min/+max Units MinMaxVector.intClippingRange N/A 90 0 1000 thrpt 4 1068.265 1039.087 ops/ms MinMaxVector.intClippingRange N/A 100 0 1000 thrpt 4 1067.705 1038.760 ops/ms MinMaxVector.intLoopMax 50 N/A N/A 2048 thrpt 4 605.015 604.364 ops/ms MinMaxVector.intLoopMax 80 N/A N/A 2048 thrpt 4 605.169 604.366 ops/ms MinMaxVector.intLoopMax 100 N/A N/A 2048 thrpt 4 604.527 604.494 ops/ms MinMaxVector.intLoopMin 50 N/A N/A 2048 thrpt 4 605.099 605.057 ops/ms MinMaxVector.intLoopMin 80 N/A N/A 2048 thrpt 4 495.071 605.080 ops/ms (+22%) MinMaxVector.intLoopMin 100 N/A N/A 2048 thrpt 4 495.134 495.047 ops/ms MinMaxVector.intReductionMultiplyMax 50 N/A N/A 2048 thrpt 4 407.953 407.987 ops/ms MinMaxVector.intReductionMultiplyMax 80 N/A N/A 2048 thrpt 4 407.861 408.005 ops/ms MinMaxVector.intReductionMultiplyMax 100 N/A N/A 2048 thrpt 4 873.915 407.995 ops/ms (-53%) MinMaxVector.intReductionMultiplyMin 50 N/A N/A 2048 thrpt 4 408.019 407.987 ops/ms MinMaxVector.intReductionMultiplyMin 80 N/A N/A 2048 thrpt 4 407.971 408.009 ops/ms MinMaxVector.intReductionMultiplyMin 100 N/A N/A 2048 thrpt 4 407.970 407.956 ops/ms MinMaxVector.intReductionSimpleMax 50 N/A N/A 2048 thrpt 4 460.443 460.514 ops/ms MinMaxVector.intReductionSimpleMax 80 N/A N/A 2048 thrpt 4 460.484 460.581 ops/ms MinMaxVector.intReductionSimpleMax 100 N/A N/A 2048 thrpt 4 1015.601 460.446 ops/ms (-54%) MinMaxVector.intReductionSimpleMin 50 N/A N/A 2048 thrpt 4 460.494 460.532 ops/ms MinMaxVector.intReductionSimpleMin 80 N/A N/A 2048 thrpt 4 460.489 460.451 ops/ms MinMaxVector.intReductionSimpleMin 100 N/A N/A 2048 thrpt 4 1021.420 460.435 ops/ms (-55%) This time we see an improvement in `intLoopMin` @ 80% but again it was not observable in the `perfasm` run. `intReductionMultiplyMax` and `intReductionSimpleMax` @ 100% have regressions, the familiar one of cmp+mov vs cmov. `intReductionMultiplyMin` @ 100% does not have a regression for the same reasons above, both use cmov. The interesting thing is `intReductionSimpleMin` @ 100%. We see a regression there but I didn't observe it with the `perfasm` run. So, this could be due to variance in the application of `cmov` or not? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2670609470 From ccheung at openjdk.org Thu Feb 20 07:13:55 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Thu, 20 Feb 2025 07:13:55 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v4] In-Reply-To: References: <_9GFraN7--YUC8esAB28iHzdRC7eYJok355TpDH7Df8=.30548f08-454f-47ba-83d5-a4feabaee9ff@github.com> Message-ID: On Thu, 20 Feb 2025 02:15:01 GMT, David Holmes wrote: >> It is now being handled in `ClassLoaderDataShared::ensure_module_entry_tables_exist()` and `AOTCodeSourceConfig::dumptime_init_helper()`. > > I don't see anything there that does a vm_exit if something has gone wrong. ?? How about adding the vm_exit in `ClassLoaderDataShared::ensure_module_entry_table_exist()` instead of assert? void ClassLoaderDataShared::ensure_module_entry_table_exist(oop class_loader) { Handle h_loader(JavaThread::current(), class_loader); ModuleEntryTable* met = Modules::get_module_entry_table(h_loader); if (met == nullptr) { vm_exit_during_initialization("ClassLoaderDataShared::ensure_module_entry_table_exist() failed unexpectedly"); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1962972927 From alanb at openjdk.org Thu Feb 20 07:17:57 2025 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 20 Feb 2025 07:17:57 GMT Subject: RFR: 8349620: Add VMProps for static JDK [v3] In-Reply-To: References: <7Xbnn-2LkNv3Gsj6nFHXdrdvvPO7vXi3K3MWm33E-jw=.8341aa47-99de-4a67-8339-64b46fa7bb36@github.com> Message-ID: On Wed, 19 Feb 2025 21:05:47 GMT, Jiangli Zhou wrote: > * No `jlink` tool in `static-jdk` when running on static JDK. This is currently observable using the `static-jdk`. > * No separate `lib/modules` file (and other JDK resource files) if we build a single hermetic Java image for the test. Once the efforts are further along then it will be necessary to ensure that tests that rely on bin/ have the appropriate `@modules jdk.jartool` (or some other tool module) so that jtreg selects the appropriate set of tests to run. ToolProvider should work so tests that use this to run tools "in process" can execute. The jdk.static requires properties is slightly different, this is what will be used to select/not-select tests that can only run with a modular run-time image or static image. This includes tests that might rely on files in the JDK conf directory (user editable configuration files, not the same thing as JDK resource files as they will just work). It's a bit premature to get into jlink here. A static image could include the jdk.jlink module but would only be useful when invoked with ToolProvider and with a module path that contains packaged modules that is can consume, and only if those packaged modules were created from the same src bits. There's a lot more to this for later phases of this effort. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2670648319 From epeter at openjdk.org Thu Feb 20 07:21:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Feb 2025 07:21:45 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v3] In-Reply-To: References: Message-ID: > Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below. > > **Background** > > With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer. > > **Problem** > > So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code. > > > MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1); > MemorySegment nativeUnaligned = nativeAligned.asSlice(1); > test3(nativeUnaligned); > > > When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not! > > static void test3(MemorySegment ms) { > for (int i = 0; i < RANGE; i++) { > long adr = i * 4L; > int v = ms.get(ELEMENT_LAYOUT, adr); > ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1)); > } > } > > > **Solution: Runtime Checks - Predicate and Multiversioning** > > Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check. > > I came up with 2 options where to place the runtime checks: > - A new "auto vectorization" Parse Predicate: > - This only works when predicates are available. > - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop. > - Multiversion the loop: > - Create 2 copies of the loop (fast and slow loops). > - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take > - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even unaligned `base`s would end up with reasonably fast code. > - We "stall" the `... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: adjust selector if probability ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22016/files - new: https://git.openjdk.org/jdk/pull/22016/files/a98ffabf..b3044bc5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22016&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22016&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22016.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22016/head:pull/22016 PR: https://git.openjdk.org/jdk/pull/22016 From dholmes at openjdk.org Thu Feb 20 07:31:57 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Feb 2025 07:31:57 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v4] In-Reply-To: References: <_9GFraN7--YUC8esAB28iHzdRC7eYJok355TpDH7Df8=.30548f08-454f-47ba-83d5-a4feabaee9ff@github.com> Message-ID: On Thu, 20 Feb 2025 07:11:19 GMT, Calvin Cheung wrote: >> I don't see anything there that does a vm_exit if something has gone wrong. ?? > > How about adding the vm_exit in `ClassLoaderDataShared::ensure_module_entry_table_exist()` instead of assert? > > > void ClassLoaderDataShared::ensure_module_entry_table_exist(oop class_loader) { > Handle h_loader(JavaThread::current(), class_loader); > ModuleEntryTable* met = Modules::get_module_entry_table(h_loader); > if (met == nullptr) { > vm_exit_during_initialization("ClassLoaderDataShared::ensure_module_entry_table_exist() failed unexpectedly"); > } > } I can't answer that. As a refactoring I expect to see the current behaviour preserved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1962992658 From eosterlund at openjdk.org Thu Feb 20 08:09:06 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 20 Feb 2025 08:09:06 GMT Subject: RFR: 8347335: ZGC: Use limitless mark stack memory [v2] In-Reply-To: <9h8RYyi02b9Hz6EGoef3tCHmAHYpB8bdgyUiXkZeC0s=.25738b1a-f580-4c1a-a9cc-f76dbc03bc1e@github.com> References: <9h8RYyi02b9Hz6EGoef3tCHmAHYpB8bdgyUiXkZeC0s=.25738b1a-f580-4c1a-a9cc-f76dbc03bc1e@github.com> Message-ID: > When ZGC performs marking, a lock-free data structure is used to keep track of objects that still need to be traced in the object traversal. This lock-free data structure uses versioned pointer as a technique to avoid ABA problems, prevalent when writing lock-free data structures. This required partitioning pointers in the structure to embed both a version and a location. > > Due to the reduced addressability of locations with only a portion of the pointer bits, a special memory space was created to manage the data structure such that offsets could be encoded, instead of addresses. > > Since the memory area needs to be contiguous, the JVM needs to know what the expected maximum size of this space will ever be, within some limiting bounds. That is what `-XX:ZMarkStackSpaceLimit` controls. > > While this strategy has worked well in practice, the design does limit the scalability of ZGC, due to limits in how much contiguous memory can be encoded with a subset of the pointer bits. Not to mention that users have no idea what number to put in to this JVM option. > > The `-XX:ZMarkStackSpaceLimit` JVM option is needed due to using a contiguous allocator to solve an ABA problem in a lock-free data structure. By selecting another solution for the ABA problem, the need for the special contiguous memory allocator and hence the JVM option can be removed. > > This PR proposes a new solution for that original ABA problem in the lock-free data structure, which renders the entire machinery behind the `-XX:ZMarkStackSpaceLimit` JVM option redundant. The proposed technique is to use hazard pointers instead. > > The use of hazard pointers is a well established safe memory reclamation (SMR) technique for writing lock-free data structures, that we also use in the Threads list. The main idea is to publish what pointer has been read with a hazard pointer, so that concurrent threads know not to free memory that is being concurrently used. Freeing of such racingly accessed memory is deferred until it is safe, hence solving the ABA problem. This also allows using plain malloc/free instead of a custom contiguous memory allocator for these structures. > > Only popping nodes from the mark stacks requires hazard pointers, and only GC workers pop entries from the mark stacks. Therefore, hazard pointers may be stored in a per-worker variable. > > I have measured throughput, latency, marking times and memory usage across a number of programs and platforms, and not seen any interesting changes in the behavior, ot... Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Preexisting: Missing Include - Spelling and const - Use ZAttachedArray - Merge branch 'master' into zgc_hazard_mark_stack - 8347335: ZGC: Use limitless mark stack memory ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23571/files - new: https://git.openjdk.org/jdk/pull/23571/files/7f8ec394..7962b9a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23571&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23571&range=00-01 Stats: 27369 lines in 1207 files changed: 16311 ins; 5986 del; 5072 mod Patch: https://git.openjdk.org/jdk/pull/23571.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23571/head:pull/23571 PR: https://git.openjdk.org/jdk/pull/23571 From aph at openjdk.org Thu Feb 20 08:50:56 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 20 Feb 2025 08:50:56 GMT Subject: RFR: 8345285: [s390x] test failures: foreign/normalize/TestNormalize.java with C2 [v2] In-Reply-To: <5mzVC2Orm-UK6GVDMIaL94k6ne7ojjE5PBRKIOE7-UQ=.4cc7aa73-e265-4a92-b646-8d17e8c147e3@github.com> References: <5mzVC2Orm-UK6GVDMIaL94k6ne7ojjE5PBRKIOE7-UQ=.4cc7aa73-e265-4a92-b646-8d17e8c147e3@github.com> Message-ID: On Wed, 22 Jan 2025 03:37:10 GMT, Amit Kumar wrote: >> Fixes `foreign/normalize/TestNormalize.java` failure on s390x. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > take ppc route Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23197#pullrequestreview-2629104908 From dchuyko at openjdk.org Thu Feb 20 08:54:59 2025 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Thu, 20 Feb 2025 08:54:59 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 23:57:34 GMT, Dean Long wrote: > remove the #if and instead guard with !PreserveFramePointer? It doesn't seem necessary to change the current behavior for Int->C2, especially only for a single platform. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23682#issuecomment-2670837344 From amitkumar at openjdk.org Thu Feb 20 08:56:03 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 20 Feb 2025 08:56:03 GMT Subject: RFR: 8345285: [s390x] test failures: foreign/normalize/TestNormalize.java with C2 [v2] In-Reply-To: <5mzVC2Orm-UK6GVDMIaL94k6ne7ojjE5PBRKIOE7-UQ=.4cc7aa73-e265-4a92-b646-8d17e8c147e3@github.com> References: <5mzVC2Orm-UK6GVDMIaL94k6ne7ojjE5PBRKIOE7-UQ=.4cc7aa73-e265-4a92-b646-8d17e8c147e3@github.com> Message-ID: On Wed, 22 Jan 2025 03:37:10 GMT, Amit Kumar wrote: >> Fixes `foreign/normalize/TestNormalize.java` failure on s390x. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > take ppc route Thanks Andrew, Martin for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23197#issuecomment-2670838117 From amitkumar at openjdk.org Thu Feb 20 08:56:04 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 20 Feb 2025 08:56:04 GMT Subject: Integrated: 8345285: [s390x] test failures: foreign/normalize/TestNormalize.java with C2 In-Reply-To: References: Message-ID: On Mon, 20 Jan 2025 11:24:39 GMT, Amit Kumar wrote: > Fixes `foreign/normalize/TestNormalize.java` failure on s390x. This pull request has now been integrated. Changeset: c5c91a82 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/c5c91a82931d8bd3aa4dc1568162097ef4b66ce0 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod 8345285: [s390x] test failures: foreign/normalize/TestNormalize.java with C2 Reviewed-by: mdoerr, aph ------------- PR: https://git.openjdk.org/jdk/pull/23197 From aboldtch at openjdk.org Thu Feb 20 09:04:53 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 20 Feb 2025 09:04:53 GMT Subject: RFR: 8347335: ZGC: Use limitless mark stack memory [v2] In-Reply-To: References: <9h8RYyi02b9Hz6EGoef3tCHmAHYpB8bdgyUiXkZeC0s=.25738b1a-f580-4c1a-a9cc-f76dbc03bc1e@github.com> Message-ID: On Thu, 20 Feb 2025 08:09:06 GMT, Erik ?sterlund wrote: >> When ZGC performs marking, a lock-free data structure is used to keep track of objects that still need to be traced in the object traversal. This lock-free data structure uses versioned pointer as a technique to avoid ABA problems, prevalent when writing lock-free data structures. This required partitioning pointers in the structure to embed both a version and a location. >> >> Due to the reduced addressability of locations with only a portion of the pointer bits, a special memory space was created to manage the data structure such that offsets could be encoded, instead of addresses. >> >> Since the memory area needs to be contiguous, the JVM needs to know what the expected maximum size of this space will ever be, within some limiting bounds. That is what `-XX:ZMarkStackSpaceLimit` controls. >> >> While this strategy has worked well in practice, the design does limit the scalability of ZGC, due to limits in how much contiguous memory can be encoded with a subset of the pointer bits. Not to mention that users have no idea what number to put in to this JVM option. >> >> The `-XX:ZMarkStackSpaceLimit` JVM option is needed due to using a contiguous allocator to solve an ABA problem in a lock-free data structure. By selecting another solution for the ABA problem, the need for the special contiguous memory allocator and hence the JVM option can be removed. >> >> This PR proposes a new solution for that original ABA problem in the lock-free data structure, which renders the entire machinery behind the `-XX:ZMarkStackSpaceLimit` JVM option redundant. The proposed technique is to use hazard pointers instead. >> >> The use of hazard pointers is a well established safe memory reclamation (SMR) technique for writing lock-free data structures, that we also use in the Threads list. The main idea is to publish what pointer has been read with a hazard pointer, so that concurrent threads know not to free memory that is being concurrently used. Freeing of such racingly accessed memory is deferred until it is safe, hence solving the ABA problem. This also allows using plain malloc/free instead of a custom contiguous memory allocator for these structures. >> >> Only popping nodes from the mark stacks requires hazard pointers, and only GC workers pop entries from the mark stacks. Therefore, hazard pointers may be stored in a per-worker variable. >> >> I have measured throughput, latency, marking times and memory usage across a number of programs and platforms, and not seen any inter... > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Preexisting: Missing Include > - Spelling and const > - Use ZAttachedArray > - Merge branch 'master' into zgc_hazard_mark_stack > - 8347335: ZGC: Use limitless mark stack memory As the old saying goes `malloc's the limit`. Or maybe it was the sky. lgtm. Good work! ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23571#pullrequestreview-2629139541 From haosun at openjdk.org Thu Feb 20 09:06:59 2025 From: haosun at openjdk.org (Hao Sun) Date: Thu, 20 Feb 2025 09:06:59 GMT Subject: RFR: 8350303: ARM32: StubCodeGenerator::verify_stub(StubGenStubId) failed after JDK-8343767 [v2] In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 10:04:55 GMT, Hao Sun wrote: >> We encountered the following runtime error on ARM32: >> >> >> assert(StubRoutines::stub_to_blob(stub_id) == blob_id()) failed: wrong blob initial for generation of stub atomic_add >> >> >> I suppose it might be a mistake in JDK-8343767. `atomic_add` stub belongs to **initial** stubs, but it is set as **compiler** stub in JDK-8343767. >> >> Note that only ARM32 is affected as only ARM32 defines this stub. >> >> Tests: cross-build for `arm32, ppc64, riscv64, s390x` passed. Tier1~3 passed on Linux/AArch64 and Linux/x86_64 > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > fix code style Thanks for your reviews. GHA tests are all green. Let me integrate it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23687#issuecomment-2670862433 From haosun at openjdk.org Thu Feb 20 09:07:00 2025 From: haosun at openjdk.org (Hao Sun) Date: Thu, 20 Feb 2025 09:07:00 GMT Subject: Integrated: 8350303: ARM32: StubCodeGenerator::verify_stub(StubGenStubId) failed after JDK-8343767 In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 08:50:20 GMT, Hao Sun wrote: > We encountered the following runtime error on ARM32: > > > assert(StubRoutines::stub_to_blob(stub_id) == blob_id()) failed: wrong blob initial for generation of stub atomic_add > > > I suppose it might be a mistake in JDK-8343767. `atomic_add` stub belongs to **initial** stubs, but it is set as **compiler** stub in JDK-8343767. > > Note that only ARM32 is affected as only ARM32 defines this stub. > > Tests: cross-build for `arm32, ppc64, riscv64, s390x` passed. Tier1~3 passed on Linux/AArch64 and Linux/x86_64 This pull request has now been integrated. Changeset: 86d06162 Author: Hao Sun URL: https://git.openjdk.org/jdk/commit/86d0616276c0a8d60c3b7ff79ade6c83ff0c72a2 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod 8350303: ARM32: StubCodeGenerator::verify_stub(StubGenStubId) failed after JDK-8343767 Reviewed-by: shade, adinn ------------- PR: https://git.openjdk.org/jdk/pull/23687 From aph at openjdk.org Thu Feb 20 09:45:56 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 20 Feb 2025 09:45:56 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v8] In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 14:57:25 GMT, Amit Kumar wrote: >> s390x implementation for Class.isInstance intrinsic. >> >> Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. >> >> Benchmark results will be updated soon. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > remove frame requirement src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3674: > 3672: Register r_temp2, > 3673: Register r_temp3) { > 3674: assert_different_registers(r_sub_klass, r_super_klass, r_result, r_temp1, r_temp2, r_temp3, Z_R0_scratch); Suggestion: assert_different_registers(r_sub_klass, r_super_klass, r_result, r_temp1, r_temp2, r_temp3); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1963218531 From roland at openjdk.org Thu Feb 20 09:46:58 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 20 Feb 2025 09:46:58 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v3] In-Reply-To: References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> <5hd7BMjze01r6SZOvQ_Ogf_XV1UekB_mYQbpR5_Wzms=.a911ee76-094f-477c-8d24-564c4f0c39d3@github.com> Message-ID: On Wed, 19 Feb 2025 15:23:13 GMT, Emanuel Peter wrote: > Do you see any better way than having the 2x code size if we need both a slow and fast loop? No but I was confused by your comment about 3x and 4x which is why I asked for clarification. Compiled code size affects inlining decisions: if a callee has compiled code and it's larger than some threshold, then the callee is considered too expensive to inline. With your change, some method that was considered ok to inline could now be considered too big. I think that's what Vladimir is concerned by. I don't see what you can do about it, this said. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2670957288 From roland at openjdk.org Thu Feb 20 09:46:58 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 20 Feb 2025 09:46:58 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v3] In-Reply-To: References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> <5hd7BMjze01r6SZOvQ_Ogf_XV1UekB_mYQbpR5_Wzms=.a911ee76-094f-477c-8d24-564c4f0c39d3@github.com> Message-ID: On Thu, 20 Feb 2025 09:39:59 GMT, Roland Westrelin wrote: >>> So the overhead in the final code is 2x: we can expect the fast and slow paths to be about the same size so the section of code for the loop would see its size grow by 2x. >> >> Yes, if you get to the point where you add a multi-version-if condition, i.e. where SuperWord has decided it needs a speculative assumption (here for alignment, later for aliasing), then we get the whole loop 2x. I suppose we could try to make the pre-main-post loop more complicated and just multi-version the main-loop, but that sounds much more complicated. >> >> Do you see any better way than having the 2x code size if we need both a slow and fast loop? > >> Do you see any better way than having the 2x code size if we need both a slow and fast loop? > > No but I was confused by your comment about 3x and 4x which is why I asked for clarification. > Compiled code size affects inlining decisions: if a callee has compiled code and it's larger than some threshold, then the callee is considered too expensive to inline. With your change, some method that was considered ok to inline could now be considered too big. I think that's what Vladimir is concerned by. I don't see what you can do about it, this said. > @rwestrel I think I had tried some verifications above, but I could not even get it to work in all cases in `SuperWord`. > > In `VLoop::check_preconditions_helper`, I try to find either the predicate or the multiversioning if. But I cannot always find it, and I think that one reason was that the pre-loop can be lost. At least that is what I remember from 4+ weeks ago. Do you understand when that happens? It doesn't feel right that the pre loop can be lost. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2670971210 From roland at openjdk.org Thu Feb 20 09:47:01 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 20 Feb 2025 09:47:01 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v3] In-Reply-To: <47tXBG3sQGZVEE5Ya2wr46CopmDjy8OClbpqagIsjgA=.6d07b495-4777-4c7e-a3b7-820f100ec2c0@github.com> References: <47tXBG3sQGZVEE5Ya2wr46CopmDjy8OClbpqagIsjgA=.6d07b495-4777-4c7e-a3b7-820f100ec2c0@github.com> Message-ID: On Tue, 18 Feb 2025 09:42:17 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopUnswitch.cpp line 513: >> >>> 511: >>> 512: // Create new Region. >>> 513: RegionNode* region = new RegionNode(1); >> >> So we create a new `Region` every time a new condition is added? > > Yes. Are you ok with that? Or would you prefer if we extended an existing region (is that possible?) and then we'd have 2 cases, one where there is none yet, and one where we'd extend. I think adding one each time is easier, and it would get commoned anyway, right? That sounds ok to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1963217281 From roland at openjdk.org Thu Feb 20 09:47:03 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 20 Feb 2025 09:47:03 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v3] In-Reply-To: References: <-h_j1wlUqiWpk7lHDe2qqLlTPUdRLJ2NBaid6KJURCQ=.e1ef0bfa-4043-42b0-be58-ac130373c788@github.com> Message-ID: On Tue, 18 Feb 2025 10:26:37 GMT, Roland Westrelin wrote: >> @rwestrel do you consider that a blocking issue for this PR here? > > No I filed: https://bugs.openjdk.org/browse/JDK-8350330 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1963215126 From epeter at openjdk.org Thu Feb 20 10:35:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 20 Feb 2025 10:35:06 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v3] In-Reply-To: References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> <5hd7BMjze01r6SZOvQ_Ogf_XV1UekB_mYQbpR5_Wzms=.a911ee76-094f-477c-8d24-564c4f0c39d3@github.com> Message-ID: On Thu, 20 Feb 2025 09:44:16 GMT, Roland Westrelin wrote: > > @rwestrel I think I had tried some verifications above, but I could not even get it to work in all cases in `SuperWord`. > > In `VLoop::check_preconditions_helper`, I try to find either the predicate or the multiversioning if. But I cannot always find it, and I think that one reason was that the pre-loop can be lost. At least that is what I remember from 4+ weeks ago. > > Do you understand when that happens? It doesn't feel right that the pre loop can be lost. `VLoop::check_preconditions_helper` has a check like this: // To align vector memory accesses in the main-loop, we will have to adjust // the pre-loop limit. if (_cl->is_main_loop()) { CountedLoopEndNode* pre_end = _cl->find_pre_loop_end(); if (pre_end == nullptr) { return VStatus::make_failure(VLoop::FAILURE_PRE_LOOP_LIMIT); } Node* pre_opaq1 = pre_end->limit(); if (pre_opaq1->Opcode() != Op_Opaque1) { return VStatus::make_failure(VLoop::FAILURE_PRE_LOOP_LIMIT); } _pre_loop_end = pre_end; } I don't remember exactly why the pre-loop disappears. They are rare cases. The pre-loop somehow folds away, maybe because it only has a single iteration, or just so few that it would never take the backedge. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2671093141 From amitkumar at openjdk.org Thu Feb 20 10:50:57 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 20 Feb 2025 10:50:57 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v8] In-Reply-To: References: Message-ID: <2bCkUjohdIFIpLdAWHLQQbbfsbyGBJ2xQy78GB5cZ2s=.a1f992a4-f8e8-40b6-bf82-72c733583fba@github.com> On Thu, 20 Feb 2025 09:43:24 GMT, Andrew Haley wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> remove frame requirement > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3674: > >> 3672: Register r_temp2, >> 3673: Register r_temp3) { >> 3674: assert_different_registers(r_sub_klass, r_super_klass, r_result, r_temp1, r_temp2, r_temp3, Z_R0_scratch); > > Suggestion: > > assert_different_registers(r_sub_klass, r_super_klass, r_result, r_temp1, r_temp2, r_temp3); But we are still using Z_R0, to resize the frame, at line 3708 and 3718. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1963327400 From cnorrbin at openjdk.org Thu Feb 20 10:53:26 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Thu, 20 Feb 2025 10:53:26 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow Message-ID: Hi everyone, The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. Additionally, I've created `align_up_or_min`, which behaves similarly to the original align_up but handles overflows predictably across all integer types. This new function is used in the locations where overflow checks already exist, providing a safer alternative. ------------- Commit messages: - align_up assert Changes: https://git.openjdk.org/jdk/pull/23711/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346916 Stats: 87 lines in 6 files changed: 58 ins; 21 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23711.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23711/head:pull/23711 PR: https://git.openjdk.org/jdk/pull/23711 From galder at openjdk.org Thu Feb 20 10:56:58 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 20 Feb 2025 10:56:58 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Thu, 20 Feb 2025 06:50:07 GMT, Galder Zamarre?o wrote: > There is something very intriguing happening here, which I don't know it's due to min itself or int vs long. Benchmark (probability) (size) Mode Cnt -min/-max +min/+max Units MinMaxVector.intReductionMultiplyMax 100 2048 thrpt 4 876.867 407.905 ops/ms (-53%) MinMaxVector.intReductionMultiplyMin 100 2048 thrpt 4 407.963 407.956 ops/ms (1) MinMaxVector.longReductionMultiplyMax 100 2048 thrpt 4 838.845 405.371 ops/ms (-51%) MinMaxVector.longReductionMultiplyMin 100 2048 thrpt 4 825.602 414.757 ops/ms (-49%) MinMaxVector.intReductionSimpleMax 100 2048 thrpt 4 1032.561 460.486 ops/ms (-55%) MinMaxVector.intReductionSimpleMin 100 2048 thrpt 4 460.530 460.490 ops/ms (2) MinMaxVector.longReductionSimpleMax 100 2048 thrpt 4 1017.560 460.436 ops/ms (-54%) MinMaxVector.longReductionSimpleMin 100 2048 thrpt 4 959.507 459.197 ops/ms (-52%) (1) (2) It seems it's a combination of both int AND min reduction operations and disabling the intrinsic. The rest of reduction operations seems to use cmp+mov in that situation but not int+min, which uses cmov. Maybe this is intentional or maybe it's a bug, but it's interesting to notice. `intReductionMultiplyMin` -min: # VM options: -Djava.library.path=/home/vagrant/1/jdk-intrinsify-max-min-long/build/release-linux-x86_64/images/test/micro/native -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_min -XX:-UseSuperWord # Benchmark: org.openjdk.bench.java.lang.MinMaxVector.intReductionMultiplyMin # Parameters: (probability = 100, size = 2048) ... 2.29% ??? ? 0x00007f4aa40f5835: cmpl %edi, %r10d 4.25% ??? ? 0x00007f4aa40f5838: cmovgl %edi, %r10d ;*ireturn {reexecute=0 rethrow=0 return_oop=0} ??? ? ; - java.lang.Math::min at 10 (line 2119) ??? ? ; - org.openjdk.bench.java.lang.MinMaxVector::intReductionMultiplyMin at 26 (line 202) ??? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_intReductionMultiplyMin_jmhTest::intReductionMultiplyMin_thrpt_jmhStub at 19 (line 124) `intReductionMultiplyMin` +min: # VM options: -Djava.library.path=/home/vagrant/1/jdk-intrinsify-max-min-long/build/release-linux-x86_64/images/test/micro/native -XX:-UseSuperWord # Benchmark: org.openjdk.bench.java.lang.MinMaxVector.intReductionMultiplyMin # Parameters: (probability = 100, size = 2048) ... 2.06% ??? ? 0x00007ff8ec0f4c35: cmpl %edi, %r10d 4.31% ??? ? 0x00007ff8ec0f4c38: cmovgl %edi, %r10d ;*invokestatic min {reexecute=0 rethrow=0 return_oop=0} ??? ? ; - org.openjdk.bench.java.lang.MinMaxVector::intReductionMultiplyMin at 26 (line 202) ??? ? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_intReductionMultiplyMin_jmhTest::intReductionMultiplyMin_thrpt_jmhStub at 19 (line 124) `longReductionMultiplyMin` -min: # VM options: -Djava.library.path=/home/vagrant/1/jdk-intrinsify-max-min-long/build/release-linux-x86_64/images/test/micro/native -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_minL -XX:-UseSuperWord # Benchmark: org.openjdk.bench.java.lang.MinMaxVector.longReductionMultiplyMin # Parameters: (probability = 100, size = 2048) ... 0.01% ? ? ?? ? ?? 0x00007ff9d80f7609: imulq $0xb, 0x10(%r12, %r10, 8), %rbp ? ? ?? ? ?? ;*lmul {reexecute=0 rethrow=0 return_oop=0} ? ? ?? ? ?? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMultiplyMin at 24 (line 265) ? ? ?? ? ?? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMultiplyMin_jmhTest::longReductionMultiplyMin_thrpt_jmhStub at 19 (line 124) ? ? ?? ? ?? 0x00007ff9d80f760f: testq %rbp, %rbp ? ? ???? ?? 0x00007ff9d80f7612: jge 0x7ff9d80f7646 ;*lreturn {reexecute=0 rethrow=0 return_oop=0} ? ? ???? ?? ; - java.lang.Math::min at 11 (line 2134) ? ? ???? ?? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMultiplyMin at 30 (line 266) ? ? ???? ?? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMultiplyMin_jmhTest::longReductionMultiplyMin_thrpt_jmhStub at 19 (line 124) `longReductionMultiplyMin` +min: # VM options: -Djava.library.path=/home/vagrant/1/jdk-intrinsify-max-min-long/build/release-linux-x86_64/images/test/micro/native -XX:-UseSuperWord # Benchmark: org.openjdk.bench.java.lang.MinMaxVector.longReductionMultiplyMin # Parameters: (probability = 100, size = 2048) ... 0.01% ? ?? 0x00007f83400f7d76: cmpq %r13, %rdx 0.12% ? ?? 0x00007f83400f7d79: cmovlq %rdx, %r13 ;*invokestatic min {reexecute=0 rethrow=0 return_oop=0} ? ?? ; - org.openjdk.bench.java.lang.MinMaxVector::longReductionMultiplyMin at 30 (line 266) ? ?? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxVector_longReductionMultiplyMin_jmhTest::longReductionMultiplyMin_thrpt_jmhStub at 19 (line 124) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2671144644 From galder at openjdk.org Thu Feb 20 11:03:58 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 20 Feb 2025 11:03:58 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> <5oGMaD5b87inAMkco6l5ODRvWv7FRsHGJiu_UMrGrTc=.0be44429-d322-4a6f-b91d-b64a146fad05@github.com> <3ArmrOQcUoj8DhHTq1a40Oz3GE8bCDDy3FF eVgbladg=.b8e0e13b-39f3-41a6-8a1b-5ca4febb4a41@github.com> Message-ID: On Tue, 18 Feb 2025 08:43:38 GMT, Emanuel Peter wrote: >> To make it more explicit: implementing long min/max in ad files as cmp will likely remove all the 100% regressions that are observed here. I'm going to repeat the same MinMaxVector int min/max reduction test above with the ad changes @rwestrel suggested to see what effect they have. > > @galderz I think we will have the same issue with both `int` and `long`: As far as I know, it is really a difficult problem to decide at compile-time if a `cmove` or `branch` is the better choice. I'm not sure there is any heuristic for which you will not find a micro-benchmark where the heuristic made the wrong choice. > > To my understanding, these are the factors that impact the performance: > - `cmove` requires all inputs to complete before it can execute, and it has an inherent latency of a cycle or so itself. But you cannot have any branch mispredictions, and hence no branch misprediction penalties (i.e. when the CPU has to flush out the ops from the wrong branch and restart at the branch). > - `branch` can hide some latencies, because we can already continue with the branch that is speculated on. We do not need to wait for the inputs of the comparison to arrive, and we can already continue with the speculated resulting value. But if the speculation is ever wrong, we have to pay the misprediction penalty. > > In my understanding, there are roughly 3 scenarios: > - The branch probability is so extreme that the branch predictor would be correct almost always, and so it is profitable to do branching code. > - The branching probability is somewhere in the middle, and the branch is not predictable. Branch mispredictions are very expensive, and so it is better to use `cmove`. > - The branching probability is somewhere in the middle, but the branch is predictable (e.g. swapps back and forth). The branch predictor will have almost no mispredictions, and it is faster to use branching code. > > Modeling this precisely is actually a little complex. You would have to know the cost of the `cmove` and the `branching` version of the code. That depends on the latency of the inputs, and the outputs: does the `cmove` dramatically increase the latency on the critical path, and `branching` could hide some of that latency? And you would have to know how good the branch predictor is, which you cannot derive from the branching probability of our profiling (at least not when the probabilities are in the middle, and you don't know if it is a random or predictable pattern). > > If we can find a perfect heuristic - that would be fantastic ;) > > If we cannot find a perfect heuristic, then we should think about what are the most "common" or "relevant" scenarios, I think. > > But let's discuss all of this in a call / offline :) FYI @eme64 @chhagedorn @rwestrel Since we know that vectorization does not always kick in, there was a worry if scalar fallbacks would heavily suffer with the work included in this PR to add long intrinsic for min/max. Looking at the same scenarios with int (read my comments https://github.com/openjdk/jdk/pull/20098#issuecomment-2669329851 and https://github.com/openjdk/jdk/pull/20098#issuecomment-2671144644), it looks clear that the same kind of regressions are also present there. So, if those int scalar regressions were not a problem when int min/max intrinsic was added, I would expect the same to apply to long. Re: https://github.com/openjdk/jdk/pull/20098#issuecomment-2671144644 - I was trying to think what could be causing this. I thought maybe it's due to the int min/max backend, which is implemented in platform specific way, vs the long min/max backend which relies on platform independent macro expansion. But if that theory was true, I would expect the same behaviour with int max vs long max, but that's not the case. It seems odd to only see this difference with min. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2671163220 From iwalulya at openjdk.org Thu Feb 20 11:10:00 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 20 Feb 2025 11:10:00 GMT Subject: RFR: 8347335: ZGC: Use limitless mark stack memory [v2] In-Reply-To: References: <9h8RYyi02b9Hz6EGoef3tCHmAHYpB8bdgyUiXkZeC0s=.25738b1a-f580-4c1a-a9cc-f76dbc03bc1e@github.com> Message-ID: <3JlnNkmXp5GcGP9VWT5PfYqQMup58s6QZ24v0IZXaQ4=.0b56e1d9-eca8-4fae-a613-7c6927ae0883@github.com> On Thu, 20 Feb 2025 08:09:06 GMT, Erik ?sterlund wrote: >> When ZGC performs marking, a lock-free data structure is used to keep track of objects that still need to be traced in the object traversal. This lock-free data structure uses versioned pointer as a technique to avoid ABA problems, prevalent when writing lock-free data structures. This required partitioning pointers in the structure to embed both a version and a location. >> >> Due to the reduced addressability of locations with only a portion of the pointer bits, a special memory space was created to manage the data structure such that offsets could be encoded, instead of addresses. >> >> Since the memory area needs to be contiguous, the JVM needs to know what the expected maximum size of this space will ever be, within some limiting bounds. That is what `-XX:ZMarkStackSpaceLimit` controls. >> >> While this strategy has worked well in practice, the design does limit the scalability of ZGC, due to limits in how much contiguous memory can be encoded with a subset of the pointer bits. Not to mention that users have no idea what number to put in to this JVM option. >> >> The `-XX:ZMarkStackSpaceLimit` JVM option is needed due to using a contiguous allocator to solve an ABA problem in a lock-free data structure. By selecting another solution for the ABA problem, the need for the special contiguous memory allocator and hence the JVM option can be removed. >> >> This PR proposes a new solution for that original ABA problem in the lock-free data structure, which renders the entire machinery behind the `-XX:ZMarkStackSpaceLimit` JVM option redundant. The proposed technique is to use hazard pointers instead. >> >> The use of hazard pointers is a well established safe memory reclamation (SMR) technique for writing lock-free data structures, that we also use in the Threads list. The main idea is to publish what pointer has been read with a hazard pointer, so that concurrent threads know not to free memory that is being concurrently used. Freeing of such racingly accessed memory is deferred until it is safe, hence solving the ABA problem. This also allows using plain malloc/free instead of a custom contiguous memory allocator for these structures. >> >> Only popping nodes from the mark stacks requires hazard pointers, and only GC workers pop entries from the mark stacks. Therefore, hazard pointers may be stored in a per-worker variable. >> >> I have measured throughput, latency, marking times and memory usage across a number of programs and platforms, and not seen any inter... > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Preexisting: Missing Include > - Spelling and const > - Use ZAttachedArray > - Merge branch 'master' into zgc_hazard_mark_stack > - 8347335: ZGC: Use limitless mark stack memory Looks good! Driveby review ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23571#pullrequestreview-2629522459 From aph at openjdk.org Thu Feb 20 11:32:54 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 20 Feb 2025 11:32:54 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v8] In-Reply-To: <2bCkUjohdIFIpLdAWHLQQbbfsbyGBJ2xQy78GB5cZ2s=.a1f992a4-f8e8-40b6-bf82-72c733583fba@github.com> References: <2bCkUjohdIFIpLdAWHLQQbbfsbyGBJ2xQy78GB5cZ2s=.a1f992a4-f8e8-40b6-bf82-72c733583fba@github.com> Message-ID: On Thu, 20 Feb 2025 10:48:45 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3674: >> >>> 3672: Register r_temp2, >>> 3673: Register r_temp3) { >>> 3674: assert_different_registers(r_sub_klass, r_super_klass, r_result, r_temp1, r_temp2, r_temp3, Z_R0_scratch); >> >> Suggestion: >> >> assert_different_registers(r_sub_klass, r_super_klass, r_result, r_temp1, r_temp2, r_temp3); > > But we are still using Z_R0, to resize the frame, at line 3708 and 3718. I missed that because you were using different names for the same register in this function. I searched for `Z_R0_scratch`. It would be better if you decided what to call `Z_R0`, and used the same name consistently for all usages, including `assert_different_registers()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1963400828 From jbhateja at openjdk.org Thu Feb 20 11:37:08 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 20 Feb 2025 11:37:08 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v18] In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 02:36:13 GMT, Julian Waters wrote: > Is anyone else getting compile failures after this was integrated? This weirdly seems to only happen on Linux > > ``` > * For target hotspot_variant-server_libjvm_objs_mulnode.o: > /home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp: In member function ?virtual const Type* FmaHFNode::Value(PhaseGVN*) const?: > /home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp:1944:37: error: call of overloaded ?make(double)? is ambiguous > 1944 | return TypeH::make(fma(f1, f2, f3)); > | ^ > In file included from /home/runner/work/jdk/jdk/src/hotspot/share/opto/node.hpp:31, > from /home/runner/work/jdk/jdk/src/hotspot/share/opto/addnode.hpp:28, > from /home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp:26: > /home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:544:23: note: candidate: ?static const TypeH* TypeH::make(float)? > 544 | static const TypeH* make(float f); > | ^~~~ > /home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:545:23: note: candidate: ?static const TypeH* TypeH::make(short int)? > 545 | static const TypeH* make(short f); > | ^~~~ > ``` Hi @TheShermanTanker , Please file a separate JBS issue for the errors you are observing with non-standard build options. I am also seeing some other build issues with the following configuration --with-extra-cxxflags=-D__CORRECT_ISO_CPP11_MATH_H_PROTO_FP Best Regards, Jatin ------------- PR Comment: https://git.openjdk.org/jdk/pull/22754#issuecomment-2671231948 From amitkumar at openjdk.org Thu Feb 20 11:43:08 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 20 Feb 2025 11:43:08 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v9] In-Reply-To: References: Message-ID: > s390x implementation for Class.isInstance intrinsic. > > Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. > > Benchmark results will be updated soon. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: no need of Z_R0_scratch ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23535/files - new: https://git.openjdk.org/jdk/pull/23535/files/180e9f33..ffdb1342 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=07-08 Stats: 10 lines in 1 file changed: 0 ins; 1 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23535/head:pull/23535 PR: https://git.openjdk.org/jdk/pull/23535 From amitkumar at openjdk.org Thu Feb 20 11:45:52 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 20 Feb 2025 11:45:52 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v8] In-Reply-To: References: <2bCkUjohdIFIpLdAWHLQQbbfsbyGBJ2xQy78GB5cZ2s=.a1f992a4-f8e8-40b6-bf82-72c733583fba@github.com> Message-ID: On Thu, 20 Feb 2025 11:30:34 GMT, Andrew Haley wrote: >> But we are still using Z_R0, to resize the frame, at line 3708 and 3718. > > I missed that because you were using different names for the same register in this function. I searched for `Z_R0_scratch`. It would be better if you decided what to call `Z_R0`, and used the same name consistently for all usages, including `assert_different_registers()`. I have moved the result in volatile-float registers, instead of using stack for the shuffling. With that we can get rid of `Z_R0_scratch` from `assert_different_registers` as per your suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1963417361 From amitkumar at openjdk.org Thu Feb 20 11:50:53 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 20 Feb 2025 11:50:53 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v8] In-Reply-To: References: <2bCkUjohdIFIpLdAWHLQQbbfsbyGBJ2xQy78GB5cZ2s=.a1f992a4-f8e8-40b6-bf82-72c733583fba@github.com> Message-ID: On Thu, 20 Feb 2025 11:42:57 GMT, Amit Kumar wrote: >> I missed that because you were using different names for the same register in this function. I searched for `Z_R0_scratch`. It would be better if you decided what to call `Z_R0`, and used the same name consistently for all usages, including `assert_different_registers()`. > > I have moved the result in volatile-float registers, instead of using stack for the shuffling. With that we can get rid of `Z_R0_scratch` from `assert_different_registers` as per your suggestion. For verification I made it to crash manually, and we are seeing correct values only: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/amit/isInstanceIntrinsic/jdk/src/hotspot/share/oops/klass.cpp:1330), pid=3145509, tid=3145513 # fatal error: mismatch: java.nio.HeapByteBuffer implements sun.nio.ch.DirectBuffer: linear_search: 1; table_lookup: 1 # # JRE version: OpenJDK Runtime Environment (25.0) (fastdebug build 25-internal-adhoc.amit.jdk) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 25-internal-adhoc.amit.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-s390x) All of the registers have retrieved the correct values back. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1963424683 From eosterlund at openjdk.org Thu Feb 20 12:08:52 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 20 Feb 2025 12:08:52 GMT Subject: RFR: 8347335: ZGC: Use limitless mark stack memory [v2] In-Reply-To: <3JlnNkmXp5GcGP9VWT5PfYqQMup58s6QZ24v0IZXaQ4=.0b56e1d9-eca8-4fae-a613-7c6927ae0883@github.com> References: <9h8RYyi02b9Hz6EGoef3tCHmAHYpB8bdgyUiXkZeC0s=.25738b1a-f580-4c1a-a9cc-f76dbc03bc1e@github.com> <3JlnNkmXp5GcGP9VWT5PfYqQMup58s6QZ24v0IZXaQ4=.0b56e1d9-eca8-4fae-a613-7c6927ae0883@github.com> Message-ID: On Thu, 20 Feb 2025 11:07:41 GMT, Ivan Walulya wrote: > Looks good! > > > > Driveby review Thanks Ivan! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23571#issuecomment-2671307000 From stuefe at openjdk.org Thu Feb 20 12:41:40 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Feb 2025 12:41:40 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v3] In-Reply-To: References: Message-ID: > Greetings, > > This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. > > Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. > > I wanted to track that information correctly and display it clearly in a way that is easy to understand. > > The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). > > The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. > > The statistic gives us two new forms of output: > > 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: > > > Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: > Phase Total ra node comp type index reglive regsplit cienv other > none 1205512 155104 982984 33712 0 0 0 0 0 33712 > parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 > optimizer 916584 0 556416 0 0 0 0 0 0 360168 > escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 > connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 > macroEliminate 196448 0 196448 0 0 0 0 0 0 0 > iterGVN 327440 0 196368 131072 0 0 0 0 0 0 > incrementalInline 3992816 0 3043704 621832 0 0 0 0 261824... Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - avoid Thread::current in high traffic chunk alloc path - move TracePhase counter initialization into conditional path - Merge branch 'master' into JDK-8344009-Improve-Compiler-memstat - revert unnecessary copyright change - more code grooming and comments - grooming - wip - Improved compiler memory statistics ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23530/files - new: https://git.openjdk.org/jdk/pull/23530/files/4f426160..08308e9e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=01-02 Stats: 10793 lines in 456 files changed: 7069 ins; 1840 del; 1884 mod Patch: https://git.openjdk.org/jdk/pull/23530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23530/head:pull/23530 PR: https://git.openjdk.org/jdk/pull/23530 From coleenp at openjdk.org Thu Feb 20 13:00:03 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Feb 2025 13:00:03 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v3] In-Reply-To: References: <9ZTXNeE806c5EDt4Y6QFMqull0_SobjS7mOQGk2wE5s=.81291418-85a7-4826-9ecf-dcdd050ecaf1@github.com> Message-ID: On Thu, 20 Feb 2025 04:29:04 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/Class.java line 1297: >> >>> 1295: // The componentType field's null value is the sole indication that the class is an array, >>> 1296: // see isArray(). >>> 1297: private transient final Class componentType; >> >> Why the `transient` and how does this impact serialization?? > > The fields in `Class` are just inconsistently transient or not. `Class` has special treatment in the serialization specification, so the presence or absence of the `transient` modifier has no effect. Thanks Chen. I was wondering why the other JVM installed fields were transient and this one wasn't so I added it to see if someone noticed and could verify whether it's right or not. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1963520059 From stuefe at openjdk.org Thu Feb 20 13:14:34 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Feb 2025 13:14:34 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v4] In-Reply-To: References: Message-ID: > Greetings, > > This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. > > Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. > > I wanted to track that information correctly and display it clearly in a way that is easy to understand. > > The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). > > The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. > > The statistic gives us two new forms of output: > > 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: > > > Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: > Phase Total ra node comp type index reglive regsplit cienv other > none 1205512 155104 982984 33712 0 0 0 0 0 33712 > parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 > optimizer 916584 0 556416 0 0 0 0 0 0 360168 > escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 > connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 > macroEliminate 196448 0 196448 0 0 0 0 0 0 0 > iterGVN 327440 0 196368 131072 0 0 0 0 0 0 > incrementalInline 3992816 0 3043704 621832 0 0 0 0 261824... Thomas Stuefe has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: avoid Thread::current in high traffic chunk alloc path ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23530/files - new: https://git.openjdk.org/jdk/pull/23530/files/08308e9e..dd7a06ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=02-03 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23530/head:pull/23530 PR: https://git.openjdk.org/jdk/pull/23530 From aboldtch at openjdk.org Thu Feb 20 13:49:54 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 20 Feb 2025 13:49:54 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 10:39:52 GMT, Thomas Stuefe wrote: >> Hi everyone, >> >> The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. >> >> Two key changes enable this feature: >> 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. >> 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. >> >> >> Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. >> >> An example of how you could use the intrusive tree is found below: >> >> ```c++ >> struct MyIntrusiveStructure { >> Node node; // The tree node is part of an external structure >> int data; >> >> MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} >> Node* get_node() { return &node; } >> static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } >> }; >> >> Tree my_intrusive_tree; >> >> Cursor insert_cursor = my_intrusive_tree.cursor_find(0); >> Node insert_node = Node(0); >> >> // Custom allocation here is just malloc >> MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); >> new (place) MyIntrusiveStructure(0, insert_node); >> >> my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); >> >> Cursor find_cursor = my_intrusive_tree.cursor_find(0); >> int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; >> >> >> >> Please let me know any feedback or concerns! > > @caspernorrbin could you massage this patch a bit to reduce the delta to the last version? That is a good idea in general (I usually do a manual minimize-delta sweep before I undraft a PR for review). A lot seems to be code movement at first glance. Hi @tstuefe I'll give some background before I discuss this implementation. I think a lot of the design here has been taken from or is a consequence of our work in ZGC where we are rewriting parts of our internal memory management. The initial work used an early version of @caspernorrbin's non-intrusive RB-tree. We wanted to reduce the amount of work required under our allocation lock, so we had the idea of making the RB-tree intrusive and permit 'updates' where we can modify a node, and/or replace it as long as we do not break the normal strict node ordering invariant. So I built ZIntrusiveRBTree as an experiment and it worked out very well from a performance perspective. The success of us using an intrusive tree along with your wishes to also have an intrusive tree, made @jdksjolen ask us if we could use Casper's tree if it enabled support for a intrusive nodes. We would very much appreciate using a shared implementation. So I stopped iterating on the `ZIntrusiveRBTree`. But here is the implementation: https://github.com/xmas92/jdk/blob/757daf9bac98e7cb04121664d2e562f31e817faa/src/hotspot/share/gc/z/zIntrusiveRBTree.hpp https://github.com/xmas92/jdk/blob/757daf9bac98e7cb04121664d2e562f31e817faa/src/hotspot/share/gc/z/zIntrusiveRBTree.inline.hpp > It would be good to have RBNode simplified and defined outside of the tree. Possibly even in a different header. I can see RBNode being used in places where I don't know exactly what tree it goes into. It could even go into multiple trees at the same time or at different times (so the data structure would have either multiple RBNode inlined or a single one that gets repurposed for different trees). I would prefer this as well, at least in the `ZIntrusiveRBTree` at some point I found it cumbersome having the node be an inner class of the tree, so I moved it out of the tree. > In that line, it would be good to have the key in RBNode being mutable. Having it const means I am forced to write constructors for containing structures. That is cumbersome. Our use case would need this, or rather in our use case the key would probably be a dummy. In 'ZIntrusiveRBTree' we materialize the `Key` from `Node*`. Essentially we are keeping tracked of memory regions, so the virtual key is a Memory Range. https://github.com/xmas92/jdk/blob/757daf9bac98e7cb04121664d2e562f31e817faa/src/hotspot/share/gc/z/zMappedCache.cpp#L42-L49 And our lookups uses the actual address (a smaller key) to find where to insert a value. So our compare is always Compare(Key, Node*) (the Compare(Node*, Node*) is only used for verification). https://github.com/xmas92/jdk/blob/757daf9bac98e7cb04121664d2e562f31e817faa/src/hotspot/share/gc/z/zMappedCache.cpp#L95-L107 STL has a similar concept with `template< class K > iterator find( const K& x );` which allows for a different type to be used for the lookup. But the whole "the key is a function of the `Node*`" is a completely intrusive property, so it creates a discrepancy between the two trees types. We can use this list as it is, as we can just use the address of the key, but it relies on that implementation hands out a reference to the actual key in the node and not a copy. It feels flaky (and wastes bytes). I think that in the intrusive case having the Comparator compare the key to a `Node*` is preferable. And just having the intrusive node just carry the tree structure data, and let the containing object worry about the key and the value. Or maybe there is some more elegant API I am missing. Could also use meta-programming to select the (Key, Node*) operator if it exist and use (Key, Key) otherwise. > I found I had little need for cursors at all. Mostly, they just got in the way. Cursor exists to modify tree structure, but why would I ever do that manually? It is different with simple structures like linked lists, but here the tree balances itself, so it has the last say about its structure anyway. I would be perfectly happy with just the simple ability to add/remove nodes manually, use nodes to find nearby nodes (as in, nodes of nearby keys), iterate nodes with a functor etc. > The few cases I needed a cursor it was because the API forced me to (e.g. when removing a node from the tree). With insertion, it got very weird. So I have an RBNode*, want to insert it into the tree, now I need an empty Cursor to do that? So, I create an empty cursor with that key, then use that with insert_at_cursor? Why? > Let's say I already have a nearby node (result of closest_gt, for instance), but it does not satisfy me, so I add a new one. For that I need to call normal insert, so the search is done all over again (see my remark above above). It would be good if we could have an insertion with a node as an insertion hint. I was in the process of trying to use iterators instead of the cursors when I stopped working on the `ZIntrusiveRBTree`. It works right now for our use case. However there are currently may ways to do the same thing, you can use both Nodes, Cursors and Iterators to walk the tree. Both nodes and iterators track a location in the tree while the cursors have more contextual information, as they also track the insert location. The cost of using insertion hints is that it requires at least on redundant compare, which should be irrelevant. And requires some extra handling of the Tree::end() / nullptr hint, the easiest way I can think of is to have the tree keeps track of the right most node so it can find the insert location fast. Iterator hints are safer to use externally, as they are just hints, the insertion will still respect the comparator function. While with the cursor, you can insert an invalid node somewhere. As long as we do not care to about wether insert succeeds or not, I think the API where you consume an iterator hint and produce a new iterator as a result could end up useful. Similar to `std::set` / `std::map` API. The main difference with `Node*`s/iterators over cursors is that one would use `gt` over `find` as `find`. Which I think is more natural in many places as it describes. ```c++ // Insert or merge auto it = tree.gt(key); if (should_merge(*it, node)) tree.replace(it, merge_node(*it, node)); else tree.insert(it, node); // Remove all nodes >=key for (auto it = tree.find(key); it != tree.end(); it = tree.remove(it)); The nice thing with iterators over `Node*`s is that they are more versatile, and we it allows us to remove traversing the tree through `Node*`'s public API, as such they are just some space for the tree to store its data. And I think we also can get a safer interface and type system. You use `Node*` when referring to something that may not be part of the tree and you use iterators when referring to something that is in the tree (sans iterator invalidation, or using the iterator on a different tree). I think I mentioned in #22360 that implementing lightweight iterators (effectively just a fancy `Node*`, in `ZIntrusiveRBTree` I also keep track of the tree to allow for stronger asserts and bidirectionally) allows for a lot of the APIs and the implementation to be implemented in terms of the iterators. I think regardless of what is used we should have API overloads that do not require using a hint or cursor. `Tree::remove(Node*)` and `Tree::insert(Node*)` (or `Tree::insert(Key, Node*)`). Just my 2 cents. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23416#issuecomment-2671552936 From stuefe at openjdk.org Thu Feb 20 13:56:00 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Feb 2025 13:56:00 GMT Subject: RFR: 8344009: Improve compiler memory statistics In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 09:49:54 GMT, Roberto Casta?eda Lozano wrote: >>> > Hi Thomas, this looks very useful, thanks! I will run some Oracle-internal functional and performance testing and come back with the results next week. >>> >>> Functional test results (Oracle internal tier1-tier5) look good. >>> >>> I measured C2 execution time before and after the changeset using DaCapo 23 and did not find any statistically significant difference, except for a 2-3% regression on the jython benchmark (using large input size). This small regression is IMO acceptable, particularly given that these changes can be seen as an investment to improve compiler resource utilization in the long run. >> >> Hi @robcasloz, interesting, I did not expect this. What did you measure? With Compilation statistic vs without, or with old vs new, but both enabled? (best, give me both sets of command line args) > >> > > Hi Thomas, this looks very useful, thanks! I will run some Oracle-internal functional and performance testing and come back with the results next week. >> > >> > >> > Functional test results (Oracle internal tier1-tier5) look good. >> > I measured C2 execution time before and after the changeset using DaCapo 23 and did not find any statistically significant difference, except for a 2-3% regression on the jython benchmark (using large input size). This small regression is IMO acceptable, particularly given that these changes can be seen as an investment to improve compiler resource utilization in the long run. >> >> Hi @robcasloz, interesting, I did not expect this. What did you measure? With Compilation statistic vs without, or with old vs new, but both enabled? (best, give me both sets of command line args) > > I measured and compared C2 speed in bytecodes/s as reported by `-XX:+CITime` (averaged over a number of repetitions). I wanted to test that the feature does not affect C2's execution time when not used, so I simply compared C2 compilation speed for `jdk-25+10` vs. `jdk-25+10` with this changeset applied on top (both release builds) and `-XX:+CITime -Xbatch -XX:-TieredCompilation` on both builds (the last two flags for better stability across benchmark repetitions). I could observe the regression on both linux-x64 and macosx-aarch64 platforms. Let me know if you need more details. @robcasloz I identified and hopefully fixed a small issue that hit the "disabled" path. Turns out we allocate arena chunks a lot more frequently than I thought, and the new unconditional call to Thread::current() in there was hurting a bit. I now avoid this unless I know the statistic is enabled. With this patch, on my machine the difference between unpatched and patched JVM with stats disabled is below one standard deviation for the benchmark in question. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2671563378 From rcastanedalo at openjdk.org Thu Feb 20 14:02:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 20 Feb 2025 14:02:53 GMT Subject: RFR: 8344009: Improve compiler memory statistics In-Reply-To: References: Message-ID: <0wHGNSlwe7cWb7Plad2n8Swy8rayYTAf5IETuw9zl4U=.a4d6a129-aebc-4639-aaef-92ee6c4552c7@github.com> On Wed, 19 Feb 2025 09:49:54 GMT, Roberto Casta?eda Lozano wrote: >>> > Hi Thomas, this looks very useful, thanks! I will run some Oracle-internal functional and performance testing and come back with the results next week. >>> >>> Functional test results (Oracle internal tier1-tier5) look good. >>> >>> I measured C2 execution time before and after the changeset using DaCapo 23 and did not find any statistically significant difference, except for a 2-3% regression on the jython benchmark (using large input size). This small regression is IMO acceptable, particularly given that these changes can be seen as an investment to improve compiler resource utilization in the long run. >> >> Hi @robcasloz, interesting, I did not expect this. What did you measure? With Compilation statistic vs without, or with old vs new, but both enabled? (best, give me both sets of command line args) > >> > > Hi Thomas, this looks very useful, thanks! I will run some Oracle-internal functional and performance testing and come back with the results next week. >> > >> > >> > Functional test results (Oracle internal tier1-tier5) look good. >> > I measured C2 execution time before and after the changeset using DaCapo 23 and did not find any statistically significant difference, except for a 2-3% regression on the jython benchmark (using large input size). This small regression is IMO acceptable, particularly given that these changes can be seen as an investment to improve compiler resource utilization in the long run. >> >> Hi @robcasloz, interesting, I did not expect this. What did you measure? With Compilation statistic vs without, or with old vs new, but both enabled? (best, give me both sets of command line args) > > I measured and compared C2 speed in bytecodes/s as reported by `-XX:+CITime` (averaged over a number of repetitions). I wanted to test that the feature does not affect C2's execution time when not used, so I simply compared C2 compilation speed for `jdk-25+10` vs. `jdk-25+10` with this changeset applied on top (both release builds) and `-XX:+CITime -Xbatch -XX:-TieredCompilation` on both builds (the last two flags for better stability across benchmark repetitions). I could observe the regression on both linux-x64 and macosx-aarch64 platforms. Let me know if you need more details. > @robcasloz I identified and hopefully fixed a small issue that hit the "disabled" path. Turns out we allocate arena chunks a lot more frequently than I thought, and the new unconditional call to Thread::current() in there was hurting a bit. I now avoid this unless I know the statistic is enabled. > > With this patch, on my machine the difference between unpatched and patched JVM with stats disabled is below one standard deviation for the benchmark in question. Great, thanks! Will re-run benchmarking and report results early next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2671587462 From aph at openjdk.org Thu Feb 20 14:42:53 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 20 Feb 2025 14:42:53 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v8] In-Reply-To: References: <2bCkUjohdIFIpLdAWHLQQbbfsbyGBJ2xQy78GB5cZ2s=.a1f992a4-f8e8-40b6-bf82-72c733583fba@github.com> Message-ID: On Thu, 20 Feb 2025 11:48:23 GMT, Amit Kumar wrote: >> I have moved the result in volatile-float registers, instead of using stack for the shuffling. With that we can get rid of `Z_R0_scratch` from `assert_different_registers` as per your suggestion. > > For verification I made it to crash manually, and we are seeing correct values only: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/amit/isInstanceIntrinsic/jdk/src/hotspot/share/oops/klass.cpp:1330), pid=3145509, tid=3145513 > # fatal error: mismatch: java.nio.HeapByteBuffer implements sun.nio.ch.DirectBuffer: linear_search: 1; table_lookup: 1 > # > # JRE version: OpenJDK Runtime Environment (25.0) (fastdebug build 25-internal-adhoc.amit.jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 25-internal-adhoc.amit.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-s390x) > > > All of the registers have retrieved the correct values back. > I have moved the result in volatile-float registers, instead of using stack for the shuffling. With that we can get rid of `Z_R0_scratch` from `assert_different_registers` as per your suggestion. Fair enough, but this is a general recommendation to do with good practice in hand-coded assembler. Using aliases for registers can result in a maintenance programmer not noticing that two different names refer to the same thing. , This has led to bugs that aren't revealed in tests but crash in production. Sure, aliases (e.g `obj`, `klass`, `index` etc. can help readbility, but be careful. This isn't intended to be a total ban, but a note of caution. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1963691768 From cnorrbin at openjdk.org Thu Feb 20 15:02:26 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Thu, 20 Feb 2025 15:02:26 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v10] In-Reply-To: References: Message-ID: <5vy3QKCRLlWpob5-Iqe9oR3w_ax4jD5IVirVB_cOMaM=.4c6a1bd5-3c6a-4466-8562-ec085a15b684@github.com> > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - separate rbnode and normal tree subclass - Merge branch 'master' into rb-tree-intrusive-v2 - renamed non-value upsert to insert - johan feedback - empty base optimization reference - Merge branch 'master' into rb-tree-intrusive-v2 - initialize node on insert + more tests - windows build - build fix - reduced diff - ... and 2 more: https://git.openjdk.org/jdk/compare/f1258f9e...7d4a9ccc ------------- Changes: https://git.openjdk.org/jdk/pull/23416/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=09 Stats: 936 lines in 3 files changed: 590 ins; 103 del; 243 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From gziemski at openjdk.org Thu Feb 20 15:05:17 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 20 Feb 2025 15:05:17 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v50] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: fix thread names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/4a295517..60394ecf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=49 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=48-49 Stats: 80 lines in 2 files changed: 37 ins; 7 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From rriggs at openjdk.org Thu Feb 20 15:16:56 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 20 Feb 2025 15:16:56 GMT Subject: RFR: 8350151: Support requires property to filter tests incompatible with --enable-preview [v2] In-Reply-To: References: <1iY92LjhRPbtuENrxBQlsCOKx2EHI6leLAfbkorEGzE=.e964726d-cf2c-4715-91fc-c76fc3e6668d@github.com> Message-ID: On Wed, 19 Feb 2025 20:32:56 GMT, Leonid Mesnik wrote: >> It ran ok for me, once I got the command line flags correct. >> It ran ok if I added `@enablePreview`. >> >> It also ran ok with an explicit @run command: (it does not currently have an @run command). >> >> * @run main/othervm --enable-preview SecurityManagerWarnings >> ``` > > For me it fails with > ----------System.err:(18/917)---------- > stdout: []; > stderr: [Error: Unable to initialize main class SecurityManagerWarnings > Caused by: java.lang.NoClassDefFoundError: jdk/test/lib/process/OutputAnalyzer > ] > exitValue = 1 > that seems pretty strange, might be test library issue? I haven't been able to reproduce that locally. Even with mis-matched compilation of the test library and test code. I noticed the NoClassDefFoundError message comes from the child process. The child is invoked with test.noclasspath=true and no path to the test library. (intentionally) The SecurityManagerWarning class explicitly refers to OutputAnalyzer. There might be a path in which the new VM tries to load the OutputAnalyzer (and throw an error) Finding a way to reproduce locally might be necessary to track down the cause. Perhaps adding -Xlog (for the child) might provider more information about the sequence of events. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23653#discussion_r1963767231 From cnorrbin at openjdk.org Thu Feb 20 15:17:18 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Thu, 20 Feb 2025 15:17:18 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v11] In-Reply-To: References: Message-ID: <93YSV7QUAa8oJw3SJC8aRLdKR9V_8YGK1Liy0JsF288=.811957aa-b110-48f8-af4d-dbd46c88ce92@github.com> > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: build fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23416/files - new: https://git.openjdk.org/jdk/pull/23416/files/7d4a9ccc..65892c4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=09-10 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From cnorrbin at openjdk.org Thu Feb 20 15:17:20 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Thu, 20 Feb 2025 15:17:20 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v10] In-Reply-To: <5vy3QKCRLlWpob5-Iqe9oR3w_ax4jD5IVirVB_cOMaM=.4c6a1bd5-3c6a-4466-8562-ec085a15b684@github.com> References: <5vy3QKCRLlWpob5-Iqe9oR3w_ax4jD5IVirVB_cOMaM=.4c6a1bd5-3c6a-4466-8562-ec085a15b684@github.com> Message-ID: On Thu, 20 Feb 2025 15:02:26 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. >> >> Two key changes enable this feature: >> 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. >> 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. >> >> >> Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. >> >> An example of how you could use the intrusive tree is found below: >> >> ```c++ >> struct MyIntrusiveStructure { >> Node node; // The tree node is part of an external structure >> int data; >> >> MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} >> Node* get_node() { return &node; } >> static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } >> }; >> >> Tree my_intrusive_tree; >> >> Cursor insert_cursor = my_intrusive_tree.cursor_find(0); >> Node insert_node = Node(0); >> >> // Custom allocation here is just malloc >> MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); >> new (place) MyIntrusiveStructure(0, insert_node); >> >> my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); >> >> Cursor find_cursor = my_intrusive_tree.cursor_find(0); >> int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; >> >> >> >> Please let me know any feedback or concerns! > > Casper Norrbin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - separate rbnode and normal tree subclass > - Merge branch 'master' into rb-tree-intrusive-v2 > - renamed non-value upsert to insert > - johan feedback > - empty base optimization reference > - Merge branch 'master' into rb-tree-intrusive-v2 > - initialize node on insert + more tests > - windows build > - build fix > - reduced diff > - ... and 2 more: https://git.openjdk.org/jdk/compare/f1258f9e...7d4a9ccc I just pushed a change addressing some of the points mentioned: - Separated `RBNode` into its own class - Separated the 'normal' `RBTree` (the parts using an allocator) to its own subclass, with the the intrusive tree essentially becoming the base. - Added `insert_node(Node)` and `remove(Node)` so that you can use the intrusive parts without necessary having to use cursors. - Added an optional `hint_node` parameter to insert and lookup functions that if given, starts searching for the intended node/place starting at that node instead of the tree root Any feedback would be appreciated! Thank you both for the comprehensive comments, I'll continue working on this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23416#issuecomment-2671795419 From rriggs at openjdk.org Thu Feb 20 15:20:52 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 20 Feb 2025 15:20:52 GMT Subject: RFR: 8350151: Support requires property to filter tests incompatible with --enable-preview [v2] In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 17:49:57 GMT, Leonid Mesnik wrote: >> It might be useful to be able to run testing with --enable-preview for feature development. The tests incompatible with this mode must be filtered out. >> >> I chose name 'java.enablePreview' , because it is more java property than vm or jdk. And 'enablePreview' to be similar with jtreg tag. >> >> Tested by running all test suites, and verifying that test is correctly selected. >> There are more tests incompatible with --enable-preview, will mark them in the following bug. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > change other test to exclude Marked as reviewed by rriggs (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23653#pullrequestreview-2630226768 From rriggs at openjdk.org Thu Feb 20 15:20:53 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 20 Feb 2025 15:20:53 GMT Subject: RFR: 8350151: Support requires property to filter tests incompatible with --enable-preview [v2] In-Reply-To: References: <1iY92LjhRPbtuENrxBQlsCOKx2EHI6leLAfbkorEGzE=.e964726d-cf2c-4715-91fc-c76fc3e6668d@github.com> Message-ID: On Thu, 20 Feb 2025 15:13:47 GMT, Roger Riggs wrote: >> For me it fails with >> ----------System.err:(18/917)---------- >> stdout: []; >> stderr: [Error: Unable to initialize main class SecurityManagerWarnings >> Caused by: java.lang.NoClassDefFoundError: jdk/test/lib/process/OutputAnalyzer >> ] >> exitValue = 1 >> that seems pretty strange, might be test library issue? > > I haven't been able to reproduce that locally. Even with mis-matched compilation of the test library and test code. > > I noticed the NoClassDefFoundError message comes from the child process. > The child is invoked with test.noclasspath=true and no path to the test library. (intentionally) > The SecurityManagerWarning class explicitly refers to OutputAnalyzer. > There might be a path in which the new VM tries to load the OutputAnalyzer (and throw an error) > > Finding a way to reproduce locally might be necessary to track down the cause. > Perhaps adding -Xlog (for the child) might provider more information about the sequence of events. Please file a separate bug for the failure, so further investigation can be done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23653#discussion_r1963775424 From shade at openjdk.org Thu Feb 20 15:35:57 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Feb 2025 15:35:57 GMT Subject: RFR: 8343468: GenShen: Enable relocation of remembered set card tables [v3] In-Reply-To: References: <6_AoWQhldJttOIEOL1T7HSapPzE4Qn2j4WN7E-bI3rM=.2685d3d8-e47c-42a6-845b-b68f50cc568e@github.com> Message-ID: On Thu, 20 Feb 2025 01:32:42 GMT, Cesar Soares Lucas wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - Merge master >> - Addressing PR comments: some refactorings, ppc fix, off-by-one fix. >> - Relocation of Card Tables > > src/hotspot/share/gc/shared/cardTable.hpp line 205: > >> 203: virtual CardValue* byte_map_base() const { return _byte_map_base; } >> 204: >> 205: virtual CardValue* byte_map() const { return _byte_map; } > > @shipilev - can you please confirm that this is the part that you didn't like? Yes, I am not fond of extending `CardTable` with virtual members, especially if they can be used on high-performance paths. Not sure if the following idea is viable. ShenandoahBarrierSet knows where to get card table base: from Shenandoah thread local data. Now it looks like we need to deal with two problems: 1. Protect ourselves from accidentally calling `CardTable` methods that may reference "incorrect" `_byte_map_(base)`. To do that, it looks it is enough to initialize `CardTable::_byte_map_(base)` to non-sensical values (`nullptr`-s?), and let the testing crash. 2. Allow calls to `CardTable` utility methods with our base. For that, I think we can drill a few new (non-virtual) methods in `CardTable`, and enter from Shenandoah through them. So for example `byte_for_index(const size_t card_index)` becomes: ``` CardValue* byte_for_index(const CardValue* base, const size_t card_index) const { return base + card_index; } CardValue* byte_for_index(const size_t card_index) const { return byte_for_index(_byte_map, card_index); } ``` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23170#discussion_r1963810697 From sroy at openjdk.org Thu Feb 20 15:41:12 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Thu, 20 Feb 2025 15:41:12 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v26] In-Reply-To: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: > JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) > > Currently acceleration code for GHASH is missing for PPC64. > > The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: - change branch and remove not needed variables - change branch and remove not needed variables ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20235/files - new: https://git.openjdk.org/jdk/pull/20235/files/b37b09da..467af71c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=24-25 Stats: 27 lines in 1 file changed: 8 ins; 13 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20235/head:pull/20235 PR: https://git.openjdk.org/jdk/pull/20235 From ihse at openjdk.org Thu Feb 20 15:52:13 2025 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 20 Feb 2025 15:52:13 GMT Subject: RFR: 8349620: Add VMProps for static JDK [v3] In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 19:29:10 GMT, Jiangli Zhou wrote: >> Please review this change that adds the `jdk.static` VMProps. It can be used to skip tests not for running on static JDK. >> >> This also adds a new WhiteBox native method, `jdk.test.whitebox.WhiteBox.isStatic()`, which is used by VMProps to determine if it's static at runtime. >> >> `@requires !jdk.static` is added in `test/hotspot/jtreg/runtime/modules/ModulesSymLink.java` to skip running the test on static JDK. This test uses `bin/jlink`, which is not provided on static JDK. There are other tests that require tools in `bin/`. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. > > Jiangli Zhou has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. Hi everyone! Sorry for the late reply, I've been ill for a while and have been working through my backlog. I have independently been working on a solution to get the static JDK image to pass all JTReg tests. I have not created a JBS issue for it yet (before I prototyped this I was not sure it was a feasible way), but my current WIP branch is here: https://github.com/openjdk/jdk/compare/master...magicus:jdk:add-static-relauncher. I was just about to finish the last parts on it prior to falling ill. In short, what we do in a normal JDK build when we create e.g. the `jar` tool is that we recompiled the `main.c` file making the launcher, but hard-coding the launcher to run the class `sun.tools.jar.Main`, using the JLI interface. In my branch, I instead create a trivial, stand-alone program (I call it a "relauncher") that will just re-execute the real `java` executable with the proper arguments to get it to run the class `sun.tools.jar.Main`. (There are some more subtleties surrounding doing this, but that is the gist of it.) This way, we can have a single, statically linked `java` binary, but also have these tiny helper tools that just falls back on the static java. This will make a static JDK image behave indistinguishable from a normal JDK image, and thus being able to run all JTreg tests that require a tool to be present. Ideally, I'd like for the static JDK image to be able to pass the JCK, so we can be sure it is fully up to par to a normal JDK image. (But I have not tried doing that yet.) I cannot really say how my work relates to this PR. My initial reaction is that Jiangli's addition to the whitebox API to let tests know if they are run in a static context or not is sound. Which of the existing tests really will need this annotation in the end is perhaps less clear. But it will allow for tests to explicitly check stuff that might go wrong on a static build. Oh, and while I was writing that, the PR was committed. ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2671902318 PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2671903747 From jiangli at openjdk.org Thu Feb 20 15:52:13 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 20 Feb 2025 15:52:13 GMT Subject: Integrated: 8349620: Add VMProps for static JDK In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 23:51:41 GMT, Jiangli Zhou wrote: > Please review this change that adds the `jdk.static` VMProps. It can be used to skip tests not for running on static JDK. > > This also adds a new WhiteBox native method, `jdk.test.whitebox.WhiteBox.isStatic()`, which is used by VMProps to determine if it's static at runtime. > > `@requires !jdk.static` is added in `test/hotspot/jtreg/runtime/modules/ModulesSymLink.java` to skip running the test on static JDK. This test uses `bin/jlink`, which is not provided on static JDK. There are other tests that require tools in `bin/`. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. This pull request has now been integrated. Changeset: 960ad211 Author: Jiangli Zhou URL: https://git.openjdk.org/jdk/commit/960ad211867d65a993b2fc4e6dafa8cea9827b3f Stats: 19 lines in 6 files changed: 16 ins; 0 del; 3 mod 8349620: Add VMProps for static JDK Reviewed-by: alanb, manc ------------- PR: https://git.openjdk.org/jdk/pull/23528 From coleenp at openjdk.org Thu Feb 20 16:25:04 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Feb 2025 16:25:04 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v7] In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 14:53:55 GMT, Amit Kumar wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> space for 3 registers > > New benchmark result: > > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 4.271 ? 0.034 ns/op > SecondarySupersLookup.testNegative01 avgt 15 4.270 ? 0.048 ns/op > SecondarySupersLookup.testNegative02 avgt 15 4.263 ? 0.019 ns/op > SecondarySupersLookup.testNegative03 avgt 15 4.266 ? 0.023 ns/op > SecondarySupersLookup.testNegative04 avgt 15 4.274 ? 0.030 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.268 ? 0.019 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.269 ? 0.022 ns/op > SecondarySupersLookup.testNegative07 avgt 15 4.280 ? 0.027 ns/op > SecondarySupersLookup.testNegative08 avgt 15 4.274 ? 0.030 ns/op > SecondarySupersLookup.testNegative09 avgt 15 4.258 ? 0.012 ns/op > SecondarySupersLookup.testNegative10 avgt 15 4.266 ? 0.023 ns/op > SecondarySupersLookup.testNegative16 avgt 15 4.257 ? 0.010 ns/op > SecondarySupersLookup.testNegative20 avgt 15 4.258 ? 0.011 ns/op > SecondarySupersLookup.testNegative30 avgt 15 4.260 ? 0.019 ns/op > SecondarySupersLookup.testNegative32 avgt 15 4.263 ? 0.024 ns/op > SecondarySupersLookup.testNegative40 avgt 15 4.260 ? 0.013 ns/op > SecondarySupersLookup.testNegative50 avgt 15 4.266 ? 0.024 ns/op > SecondarySupersLookup.testNegative55 avgt 15 28.628 ? 2.120 ns/op > SecondarySupersLookup.testNegative56 avgt 15 28.561 ? 0.477 ns/op > SecondarySupersLookup.testNegative57 avgt 15 30.626 ? 3.137 ns/op > SecondarySupersLookup.testNegative58 avgt 15 29.328 ? 0.528 ns/op > SecondarySupersLookup.testNegative59 avgt 15 32.580 ? 4.115 ns/op > SecondarySupersLookup.testNegative60 avgt 15 32.745 ? 3.782 ns/op > SecondarySupersLookup.testNegative61 avgt 15 33.227 ? 3.922 ns/op > SecondarySupersLookup.testNegative62 avgt 15 33.354 ? 3.655 ns/op > SecondarySupersLookup.testNegative63 avgt 15 35.595 ? 3.865 ns/op > SecondarySupersLookup.testNegative64 avgt 15 34.268 ? 3.374 ns/op > SecondarySupersLookup.testPositive01 avgt 15 4.800 ? 0.010 ns/op > SecondarySupersLookup.testPositive02 avgt 15 4.803 ? 0.017 ns/op > SecondarySupersLookup.testPositive03 avgt 15 4.799 ? 0.012 ns/op > SecondarySupersLookup.testPositive04 avgt 15 4.799 ? 0.012 ns/op > SecondarySupersLookup.testPositive05 avgt 15 4.797 ? 0.007 ns/op > SecondarySupersLookup.testPositive06 avgt 15 4.798 ? 0.013 ns/op > SecondarySupersLookup.testPositive07 avgt 15 4.803 ? 0.0... @offamitkumar See https://github.com/openjdk/jdk/pull/23572 I've deleted the isInstance native call so it's no longer an intrinsic. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23535#issuecomment-2671995877 From coleenp at openjdk.org Thu Feb 20 16:25:07 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Feb 2025 16:25:07 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v9] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 11:43:08 GMT, Amit Kumar wrote: >> s390x implementation for Class.isInstance intrinsic. >> >> Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. >> >> Benchmark results will be updated soon. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > no need of Z_R0_scratch Can you run your benchmark on that? The JIT compiler should be able to generate inlined code for it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23535#issuecomment-2671998298 From dfenacci at openjdk.org Thu Feb 20 16:32:18 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 20 Feb 2025 16:32:18 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) Message-ID: # Issue The test `src/hotspot/share/opto/c2compiler.cpp` fails intermittently due to a crash that happens when trying to allocate code cache space for C1 and C2 in `RuntimeStub::new_runtime_stub` and `SingletonBlob::operator new`. # Causes There are a few call paths during the initialization of C1 and C2 that can lead to the code cache allocations in `RuntimeStub::new_runtime_stub` (through `RuntimeStub::operator new`) and `SingletonBlob::operator new` triggering a fatal error if there is no more space. The paths in question are: 1. `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub` 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub` 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_uncommon_trap_blob` -> `UncommonTrapBlob::create` -> `new UncommonTrapBlob` 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_exception_blob` -> `ExceptionBlob::create` -> `new ExceptionBlob` # Solution Instead of fatally crashing the we can use the `alloc_fail_is_fatal` flag of `RuntimeStub::new_runtime_stub` to avoid crashing in cases 1 and 2 and add a similar flag to `SingletonBlob::operator new` for cases 3 and 4. In the latter case we need to adjust all calls accordingly. Note: In [JDK-8326615](https://bugs.openjdk.org/browse/JDK-8326615) it was argued that increasing the minimum code cache size would solve the issue but that wasn't entirely accurate: doing so possibly decreases the chances of a failed allocation in these 4 places but doesn't totally avoid it. # Testing The original failing regression test in `test/hotspot/jtreg/compiler/startup/StartupOutput.java` has been modified to run multiple times with randomized values (within the original failing range) to increase the chances of hitting the fatal assertion. Tests: Tier 1-4 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode) ------------- Commit messages: - JDK-8347406: reduce number of tests again - JDK-8347406: update copyright year - Merge branch 'master' into JDK-8347406 - Merge branch 'master' into JDK-8347406 - JDK-8347406: reduce number of test processes - JDK-8347406: set the C2 uncommon and exception trap blobs in OptoRuntime::generate - JDK-8347406: fix c2 runtime init return condition - JDK-8347406: reduce number of processes in test - JDK-8347406: make startup processes run in parallel - JDK-8347406: reduce number of startup test attempts - ... and 8 more: https://git.openjdk.org/jdk/compare/efbad00c...e930df47 Changes: https://git.openjdk.org/jdk/pull/23630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23630&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347406 Stats: 114 lines in 27 files changed: 38 ins; 3 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/23630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23630/head:pull/23630 PR: https://git.openjdk.org/jdk/pull/23630 From dfenacci at openjdk.org Thu Feb 20 16:46:52 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 20 Feb 2025 16:46:52 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 11:04:20 GMT, Damon Fenacci wrote: > # Issue > The test `src/hotspot/share/opto/c2compiler.cpp` fails intermittently due to a crash that happens when trying to allocate code cache space for C1 and C2 in `RuntimeStub::new_runtime_stub` and `SingletonBlob::operator new`. > > # Causes > There are a few call paths during the initialization of C1 and C2 that can lead to the code cache allocations in `RuntimeStub::new_runtime_stub` (through `RuntimeStub::operator new`) and `SingletonBlob::operator new` triggering a fatal error if there is no more space. The paths in question are: > 1. `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_uncommon_trap_blob` -> `UncommonTrapBlob::create` -> `new UncommonTrapBlob` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_exception_blob` -> `ExceptionBlob::create` -> `new ExceptionBlob` > > # Solution > Instead of fatally crashing the we can use the `alloc_fail_is_fatal` flag of `RuntimeStub::new_runtime_stub` to avoid crashing in cases 1 and 2 and add a similar flag to `SingletonBlob::operator new` for cases 3 and 4. In the latter case we need to adjust all calls accordingly. > > Note: In [JDK-8326615](https://bugs.openjdk.org/browse/JDK-8326615) it was argued that increasing the minimum code cache size would solve the issue but that wasn't entirely accurate: doing so possibly decreases the chances of a failed allocation in these 4 places but doesn't totally avoid it. > > # Testing > The original failing regression test in `test/hotspot/jtreg/compiler/startup/StartupOutput.java` has been modified to run multiple times with randomized values (within the original failing range) to increase the chances of hitting the fatal assertion. > > Tests: Tier 1-4 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode) @adinn I noticed I touched quite a few runtime files that you recently refactored. I guess it might make sense if you had a look at them. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23630#issuecomment-2672059853 From duke at openjdk.org Thu Feb 20 17:24:57 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 20 Feb 2025 17:24:57 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: <1yB95sOajuS5ptFI0GQWLepii5JsZ9DOsje-TEFyFYs=.a325ad18-17ed-4e77-b1e3-0bad2cf55c67@github.com> References: <1yB95sOajuS5ptFI0GQWLepii5JsZ9DOsje-TEFyFYs=.a325ad18-17ed-4e77-b1e3-0bad2cf55c67@github.com> Message-ID: On Tue, 11 Feb 2025 10:40:31 GMT, Bhavana Kilambi wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding comments + some code reorganization > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2618: > >> 2616: INSN(smaxp, 0, 0b101001, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S >> 2617: INSN(sminp, 0, 0b101011, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S >> 2618: INSN(sqdmulh,0, 0b101101, false); // accepted arrangements: T4H, T8H, T2S, T4S > > Hi, not a comment on the algorithm itself but you might have to add these new instructions in the gtest for aarch64 here - test/hotspot/gtest/aarch64/aarch64-asmtest.py and use this file to generate test/hotspot/gtest/aarch64/asmtest.out.h which would contain these newly added instructions. I have tried that, but the python script (actually the as command that it started) threw error messages: aarch64ops.s:338:24: error: index must be a multiple of 8 in range [0, 32760]. prfm PLDL1KEEP, [x15, 43] ^ aarch64ops.s:357:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] sub x1, x10, x23, sxth #2 ^ aarch64ops.s:359:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] add x11, x21, x5, uxtb #3 ^ aarch64ops.s:360:22: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] adds x11, x17, x17, uxtw #1 ^ aarch64ops.s:361:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] sub x11, x0, x15, uxtb #1 ^ aarch64ops.s:362:19: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] subs x7, x1, x0, sxth #2 ^ This is without any modifications from what is in the master branch currently. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1964049673 From aph at openjdk.org Thu Feb 20 17:26:55 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 20 Feb 2025 17:26:55 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v7] In-Reply-To: References: Message-ID: <8QWL249t2-XogH3PQkrco_mZJcJAJGfKBhEvrdML3yA=.9afa6d58-ea27-44ef-82b1-5dc14f686b97@github.com> On Fri, 14 Feb 2025 14:53:55 GMT, Amit Kumar wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> space for 3 registers > > New benchmark result: > > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 4.271 ? 0.034 ns/op > SecondarySupersLookup.testNegative01 avgt 15 4.270 ? 0.048 ns/op > SecondarySupersLookup.testNegative02 avgt 15 4.263 ? 0.019 ns/op > SecondarySupersLookup.testNegative03 avgt 15 4.266 ? 0.023 ns/op > SecondarySupersLookup.testNegative04 avgt 15 4.274 ? 0.030 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.268 ? 0.019 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.269 ? 0.022 ns/op > SecondarySupersLookup.testNegative07 avgt 15 4.280 ? 0.027 ns/op > SecondarySupersLookup.testNegative08 avgt 15 4.274 ? 0.030 ns/op > SecondarySupersLookup.testNegative09 avgt 15 4.258 ? 0.012 ns/op > SecondarySupersLookup.testNegative10 avgt 15 4.266 ? 0.023 ns/op > SecondarySupersLookup.testNegative16 avgt 15 4.257 ? 0.010 ns/op > SecondarySupersLookup.testNegative20 avgt 15 4.258 ? 0.011 ns/op > SecondarySupersLookup.testNegative30 avgt 15 4.260 ? 0.019 ns/op > SecondarySupersLookup.testNegative32 avgt 15 4.263 ? 0.024 ns/op > SecondarySupersLookup.testNegative40 avgt 15 4.260 ? 0.013 ns/op > SecondarySupersLookup.testNegative50 avgt 15 4.266 ? 0.024 ns/op > SecondarySupersLookup.testNegative55 avgt 15 28.628 ? 2.120 ns/op > SecondarySupersLookup.testNegative56 avgt 15 28.561 ? 0.477 ns/op > SecondarySupersLookup.testNegative57 avgt 15 30.626 ? 3.137 ns/op > SecondarySupersLookup.testNegative58 avgt 15 29.328 ? 0.528 ns/op > SecondarySupersLookup.testNegative59 avgt 15 32.580 ? 4.115 ns/op > SecondarySupersLookup.testNegative60 avgt 15 32.745 ? 3.782 ns/op > SecondarySupersLookup.testNegative61 avgt 15 33.227 ? 3.922 ns/op > SecondarySupersLookup.testNegative62 avgt 15 33.354 ? 3.655 ns/op > SecondarySupersLookup.testNegative63 avgt 15 35.595 ? 3.865 ns/op > SecondarySupersLookup.testNegative64 avgt 15 34.268 ? 3.374 ns/op > SecondarySupersLookup.testPositive01 avgt 15 4.800 ? 0.010 ns/op > SecondarySupersLookup.testPositive02 avgt 15 4.803 ? 0.017 ns/op > SecondarySupersLookup.testPositive03 avgt 15 4.799 ? 0.012 ns/op > SecondarySupersLookup.testPositive04 avgt 15 4.799 ? 0.012 ns/op > SecondarySupersLookup.testPositive05 avgt 15 4.797 ? 0.007 ns/op > SecondarySupersLookup.testPositive06 avgt 15 4.798 ? 0.013 ns/op > SecondarySupersLookup.testPositive07 avgt 15 4.803 ? 0.0... > @offamitkumar See #23572 I've deleted the isInstance native call so it's no longer an intrinsic. I see only Class.isArray(), Class.isInterface() and Class.isPrimitive(). Class.isInstance() seems to be untouched? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23535#issuecomment-2672177877 From duke at openjdk.org Thu Feb 20 17:33:18 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 20 Feb 2025 17:33:18 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v6] In-Reply-To: References: Message-ID: > By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with four additional commits since the last revision: - Accepting suggested change from Andrew Dinn - Added comments suggested by Andrew Dinn - Fixed copyright years - renaming a couple of functions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23300/files - new: https://git.openjdk.org/jdk/pull/23300/files/9a3a9444..54373d5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23300&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23300&range=04-05 Stats: 98 lines in 6 files changed: 2 ins; 0 del; 96 mod Patch: https://git.openjdk.org/jdk/pull/23300.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23300/head:pull/23300 PR: https://git.openjdk.org/jdk/pull/23300 From lmesnik at openjdk.org Thu Feb 20 18:02:59 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 20 Feb 2025 18:02:59 GMT Subject: Integrated: 8350151: Support requires property to filter tests incompatible with --enable-preview In-Reply-To: References: Message-ID: On Sat, 15 Feb 2025 19:43:39 GMT, Leonid Mesnik wrote: > It might be useful to be able to run testing with --enable-preview for feature development. The tests incompatible with this mode must be filtered out. > > I chose name 'java.enablePreview' , because it is more java property than vm or jdk. And 'enablePreview' to be similar with jtreg tag. > > Tested by running all test suites, and verifying that test is correctly selected. > There are more tests incompatible with --enable-preview, will mark them in the following bug. This pull request has now been integrated. Changeset: 1eb0db37 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/1eb0db37608ae1dd05accc1e22c57d76fa2c72ce Stats: 27 lines in 5 files changed: 19 ins; 0 del; 8 mod 8350151: Support requires property to filter tests incompatible with --enable-preview Reviewed-by: alanb, rriggs ------------- PR: https://git.openjdk.org/jdk/pull/23653 From coleenp at openjdk.org Thu Feb 20 19:38:56 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Feb 2025 19:38:56 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v9] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 11:43:08 GMT, Amit Kumar wrote: >> s390x implementation for Class.isInstance intrinsic. >> >> Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. >> >> Benchmark results will be updated soon. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > no need of Z_R0_scratch Oh sorry, I didn't read to the end of the word. Never mind my comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23535#issuecomment-2672495496 From coleenp at openjdk.org Thu Feb 20 20:11:11 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Feb 2025 20:11:11 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v4] In-Reply-To: References: Message-ID: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Update src/java.base/share/classes/java/lang/Class.java Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23572/files - new: https://git.openjdk.org/jdk/pull/23572/files/d08091ac..7a4c595b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23572&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23572&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23572.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23572/head:pull/23572 PR: https://git.openjdk.org/jdk/pull/23572 From coleenp at openjdk.org Thu Feb 20 20:19:15 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Feb 2025 20:19:15 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v5] In-Reply-To: References: Message-ID: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23572/files - new: https://git.openjdk.org/jdk/pull/23572/files/7a4c595b..02347433 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23572&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23572&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23572.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23572/head:pull/23572 PR: https://git.openjdk.org/jdk/pull/23572 From dlong at openjdk.org Thu Feb 20 20:55:52 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 20 Feb 2025 20:55:52 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 11:04:20 GMT, Damon Fenacci wrote: > # Issue > The test `src/hotspot/share/opto/c2compiler.cpp` fails intermittently due to a crash that happens when trying to allocate code cache space for C1 and C2 in `RuntimeStub::new_runtime_stub` and `SingletonBlob::operator new`. > > # Causes > There are a few call paths during the initialization of C1 and C2 that can lead to the code cache allocations in `RuntimeStub::new_runtime_stub` (through `RuntimeStub::operator new`) and `SingletonBlob::operator new` triggering a fatal error if there is no more space. The paths in question are: > 1. `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_uncommon_trap_blob` -> `UncommonTrapBlob::create` -> `new UncommonTrapBlob` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_exception_blob` -> `ExceptionBlob::create` -> `new ExceptionBlob` > > # Solution > Instead of fatally crashing the we can use the `alloc_fail_is_fatal` flag of `RuntimeStub::new_runtime_stub` to avoid crashing in cases 1 and 2 and add a similar flag to `SingletonBlob::operator new` for cases 3 and 4. In the latter case we need to adjust all calls accordingly. > > Note: In [JDK-8326615](https://bugs.openjdk.org/browse/JDK-8326615) it was argued that increasing the minimum code cache size would solve the issue but that wasn't entirely accurate: doing so possibly decreases the chances of a failed allocation in these 4 places but doesn't totally avoid it. > > # Testing > The original failing regression test in `test/hotspot/jtreg/compiler/startup/StartupOutput.java` has been modified to run multiple times with randomized values (within the original failing range) to increase the chances of hitting the fatal assertion. > > Tests: Tier 1-4 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode) I don't understand why JDK-8326615 didn't work. If the minimum codecache size was too small, can't we just increase it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23630#issuecomment-2672655838 From dlong at openjdk.org Thu Feb 20 21:27:52 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 20 Feb 2025 21:27:52 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) In-Reply-To: References: Message-ID: <97uZIfQ245vs73ViJss3Yg6lsCQBbj-bYtSyG8_GMF8=.c7e59a36-166a-4e1e-95ad-5820cb1042fc@github.com> On Fri, 14 Feb 2025 11:04:20 GMT, Damon Fenacci wrote: > # Issue > The test `src/hotspot/share/opto/c2compiler.cpp` fails intermittently due to a crash that happens when trying to allocate code cache space for C1 and C2 in `RuntimeStub::new_runtime_stub` and `SingletonBlob::operator new`. > > # Causes > There are a few call paths during the initialization of C1 and C2 that can lead to the code cache allocations in `RuntimeStub::new_runtime_stub` (through `RuntimeStub::operator new`) and `SingletonBlob::operator new` triggering a fatal error if there is no more space. The paths in question are: > 1. `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_uncommon_trap_blob` -> `UncommonTrapBlob::create` -> `new UncommonTrapBlob` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_exception_blob` -> `ExceptionBlob::create` -> `new ExceptionBlob` > > # Solution > Instead of fatally crashing the we can use the `alloc_fail_is_fatal` flag of `RuntimeStub::new_runtime_stub` to avoid crashing in cases 1 and 2 and add a similar flag to `SingletonBlob::operator new` for cases 3 and 4. In the latter case we need to adjust all calls accordingly. > > Note: In [JDK-8326615](https://bugs.openjdk.org/browse/JDK-8326615) it was argued that increasing the minimum code cache size would solve the issue but that wasn't entirely accurate: doing so possibly decreases the chances of a failed allocation in these 4 places but doesn't totally avoid it. > > # Testing > The original failing regression test in `test/hotspot/jtreg/compiler/startup/StartupOutput.java` has been modified to run multiple times with randomized values (within the original failing range) to increase the chances of hitting the fatal assertion. > > Tests: Tier 1-4 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode) src/hotspot/share/c1/c1_Runtime1.cpp line 233: > 231: oop_maps, > 232: must_gc_arguments, > 233: false); Suggestion: false /* alloc_fail_is_fatal */ ); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23630#discussion_r1964359747 From dlong at openjdk.org Thu Feb 20 21:38:54 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 20 Feb 2025 21:38:54 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 11:04:20 GMT, Damon Fenacci wrote: > # Issue > The test `src/hotspot/share/opto/c2compiler.cpp` fails intermittently due to a crash that happens when trying to allocate code cache space for C1 and C2 in `RuntimeStub::new_runtime_stub` and `SingletonBlob::operator new`. > > # Causes > There are a few call paths during the initialization of C1 and C2 that can lead to the code cache allocations in `RuntimeStub::new_runtime_stub` (through `RuntimeStub::operator new`) and `SingletonBlob::operator new` triggering a fatal error if there is no more space. The paths in question are: > 1. `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_uncommon_trap_blob` -> `UncommonTrapBlob::create` -> `new UncommonTrapBlob` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_exception_blob` -> `ExceptionBlob::create` -> `new ExceptionBlob` > > # Solution > Instead of fatally crashing the we can use the `alloc_fail_is_fatal` flag of `RuntimeStub::new_runtime_stub` to avoid crashing in cases 1 and 2 and add a similar flag to `SingletonBlob::operator new` for cases 3 and 4. In the latter case we need to adjust all calls accordingly. > > Note: In [JDK-8326615](https://bugs.openjdk.org/browse/JDK-8326615) it was argued that increasing the minimum code cache size would solve the issue but that wasn't entirely accurate: doing so possibly decreases the chances of a failed allocation in these 4 places but doesn't totally avoid it. > > # Testing > The original failing regression test in `test/hotspot/jtreg/compiler/startup/StartupOutput.java` has been modified to run multiple times with randomized values (within the original failing range) to increase the chances of hitting the fatal assertion. > > Tests: Tier 1-4 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode) src/hotspot/share/gc/shenandoah/c1/shenandoahBarrierSetC1.cpp line 306: > 304: _load_reference_barrier_phantom_rt_code_blob != nullptr; > 305: } > 306: return _pre_barrier_c1_runtime_code_blob != nullptr && reference_barrier_success; Wouldn't it be better to return false immediately after each failure, rather than continuing? src/hotspot/share/gc/z/c1/zBarrierSetC1.cpp line 543: > 541: _store_barrier_on_oop_field_without_healing = > 542: generate_c1_store_runtime_stub(blob, false /* self_healing */, "store_barrier_on_oop_field_without_healing"); > 543: return _load_barrier_on_oop_field_preloaded_runtime_stub != nullptr && Again, why not return false immediately on first failure? src/hotspot/share/opto/output.cpp line 3487: > 3485: C->record_failure("CodeCache is full"); > 3486: } else { > 3487: C->set_stub_entry_point(rs->entry_point()); Is the deleted rs->is_runtime_stub() assert still useful here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23630#discussion_r1964370135 PR Review Comment: https://git.openjdk.org/jdk/pull/23630#discussion_r1964371192 PR Review Comment: https://git.openjdk.org/jdk/pull/23630#discussion_r1964372958 From vlivanov at openjdk.org Thu Feb 20 21:56:55 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 20 Feb 2025 21:56:55 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v5] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 20:19:15 GMT, Coleen Phillimore wrote: >> Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. >> Tested with tier1-4 and performance tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace Looks good! Regarding @IntrinsicCandidate and its effects on JIT-compiler inlining decisions, @ForceInline could be added, but IMO it's not necessary since new implementations are small. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23572#pullrequestreview-2631244815 From jiangli at openjdk.org Thu Feb 20 22:37:06 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 20 Feb 2025 22:37:06 GMT Subject: RFR: 8349620: Add VMProps for static JDK [v3] In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 19:29:10 GMT, Jiangli Zhou wrote: >> Please review this change that adds the `jdk.static` VMProps. It can be used to skip tests not for running on static JDK. >> >> This also adds a new WhiteBox native method, `jdk.test.whitebox.WhiteBox.isStatic()`, which is used by VMProps to determine if it's static at runtime. >> >> `@requires !jdk.static` is added in `test/hotspot/jtreg/runtime/modules/ModulesSymLink.java` to skip running the test on static JDK. This test uses `bin/jlink`, which is not provided on static JDK. There are other tests that require tools in `bin/`. Those are not modified by the current PR to skip running on static JDK. Those can be done after the current change is fully discussed and reviewed/approved. > > Jiangli Zhou has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. > Hi everyone! Sorry for the late reply, I've been ill for a while and have been working through my backlog. > > I have independently been working on a solution to get the static JDK image to pass all JTReg tests. I have not created a JBS issue for it yet (before I prototyped this I was not sure it was a feasible way), but my current WIP branch is here: [master...magicus:jdk:add-static-relauncher](https://github.com/openjdk/jdk/compare/master...magicus:jdk:add-static-relauncher). I was just about to finish the last parts on it prior to falling ill. > > In short, what we do in a normal JDK build when we create e.g. the `jar` tool is that we recompiled the `main.c` file making the launcher, but hard-coding the launcher to run the class `sun.tools.jar.Main`, using the JLI interface. In my branch, I instead create a trivial, stand-alone program (I call it a "relauncher") that will just re-execute the real `java` executable with the proper arguments to get it to run the class `sun.tools.jar.Main`. (There are some more subtleties surrounding doing this, but that is the gist of it.) > > This way, we can have a single, statically linked `java` binary, but also have these tiny helper tools that just falls back on the static java. This will make a static JDK image behave indistinguishable from a normal JDK image, and thus being able to run all JTreg tests that require a tool to be present. Ideally, I'd like for the static JDK image to be able to pass the JCK, so we can be sure it is fully up to par to a normal JDK image. (But I have not tried doing that yet.) > > I cannot really say how my work relates to this PR. My initial reaction is that Jiangli's addition to the whitebox API to let tests know if they are run in a static context or not is sound. Which of the existing tests really will need this annotation in the end is perhaps less clear. But it will allow for tests to explicitly check stuff that might go wrong on a static build. @magicus Thanks for the thoughts and looking into the jtreg testing as well. IIUC, you want to make the required tools (perhaps a selected set of tools) usable for the static JDK at runtime, so any tests using tools at runtime can still be tested on static JDK. That seems to be a good goal and worth investigation. Your approach with re-executing the `java` executable sounds very interesting. Really like your thoughts on making the static JDK image to pass the JCK! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23528#issuecomment-2672843741 From kvn at openjdk.org Thu Feb 20 22:43:10 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 20 Feb 2025 22:43:10 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 05:53:36 GMT, Dmitry Chuyko wrote: > > Is it from here?: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvm.cpp#L379 > > Yes, I mean this check. Configure also prevent to to build VM with JVMCI without C1 or C2: [jvm-features.m4#L517](https://github.com/openjdk/jdk/blob/master/make/autoconf/jvm-features.m4#L517) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23682#issuecomment-2672855957 From kvn at openjdk.org Thu Feb 20 22:48:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 20 Feb 2025 22:48:52 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 In-Reply-To: References: Message-ID: <9iSpmytBN3FVIZctZePtb9-Lw0Dr2ZtzY15KlzP-kVo=.7b17abee-0e52-405f-ae71-af3b719d3788@github.com> On Wed, 19 Feb 2025 23:57:34 GMT, Dean Long wrote: >> The location for rfp should be set in in the register map. In particular, it wasn't set in frame::sender_for_interpreter_frame() if neither C2 nor JVMCI were included. >> >> COMPILER1_OR_COMPILER2 condition is used instead of COMPILER2_OR_JVMCI, which also covers INCLUDE_JVMCI case. > > I think @vnkozlov is right. I don't see where COMPILER1_OR_COMPILER2 is true for JVMCI. Should we use COMPILER1 || COMPILER2_OR_JVMCI, or remove the #if and instead guard with !PreserveFramePointer? I was about suggest to add comment to avoid confusion but then I thought what @dean-long suggested is better and don't need comment: #if defined(COMPILER1) || COMPILER2_OR_JVMCI ``` We already use such condition: [threads.cpp#L727](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/threads.cpp#L727) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23682#issuecomment-2672873577 From coleenp at openjdk.org Thu Feb 20 23:25:57 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Feb 2025 23:25:57 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v5] In-Reply-To: <3qpqR3PC8PFmdgaIoSYA3jDWdl-oon0-AcIzXcI76rY=.38635503-c067-4f6e-a4f1-92c1b6d991d1@github.com> References: <_j9Wkg21aBltyVrbO4wxGFKmmLDy0T-eorRL4epfS4k=.5a453b6b-d673-4cc6-b29f-192fa74e290c@github.com> <3qpqR3PC8PFmdgaIoSYA3jDWdl-oon0-AcIzXcI76rY=.38635503-c067-4f6e-a4f1-92c1b6d991d1@github.com> Message-ID: On Wed, 19 Feb 2025 21:16:51 GMT, Dean Long wrote: >> This is a good question. The heapwalker walks through dead mirrors so I can't assert that a null klass field matches our boolean setting but I don't know why this never asserts (can't find any instances in the bug database) but it seems like it could. I'll use the bool field in the mirror in the assert though but not in the return since the caller likely will fetch the klass pointer next. > >> ... but not in the return since the caller likely will fetch the klass pointer next. > > I notice that too. Callers are using is_primitive() to short-circuit calls to as_Klass(), which means they seem to be aware of this implementation detail when maybe they shouldn't. There are 136 callers so yes, it might be something that shouldn't be known in this many places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1964492501 From coleenp at openjdk.org Thu Feb 20 23:31:55 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Feb 2025 23:31:55 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v5] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 20:19:15 GMT, Coleen Phillimore wrote: >> Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. >> Tested with tier1-4 and performance tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace Thanks Vladimir for review and for answering my earlier questions on this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23572#issuecomment-2672941007 From liach at openjdk.org Thu Feb 20 23:40:55 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 20 Feb 2025 23:40:55 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v5] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 20:19:15 GMT, Coleen Phillimore wrote: >> Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. >> Tested with tier1-4 and performance tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace You are right, using the field directly is indeed better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1964502825 From dlong at openjdk.org Fri Feb 21 00:17:52 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 21 Feb 2025 00:17:52 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 10:48:26 GMT, Casper Norrbin wrote: > Hi everyone, > > The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. > > The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Additionally, I've created `align_up_or_min`, which behaves similarly to the original align_up but handles overflows predictably across all integer types. This new function is used in the locations where overflow checks already exist, providing a safer alternative. Can you explain what was wrong with the original fix? The BACKOUT only mentions that tests failed, but doesn't say why. Also, I fail to see why align_up_or_min is an improvement. It seems to silently mask errors, and the callers are not checking the result. Having a size the overflows size_t seems like an error to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23711#issuecomment-2673005108 From ccheung at openjdk.org Fri Feb 21 06:19:55 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 21 Feb 2025 06:19:55 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v5] In-Reply-To: References: Message-ID: > This changset refactors CDS class paths and module paths validation code into a new class `AOTCodeSource` and related class `AOTCodeSourceConfig`. Code has been moved from filemap.[c|h]pp, classLoader.[c|h]pp, and classLoaderExt.[c|h]pp to aotCodeSource.[c|h]pp. CDS dependencies have been removed from `classLoader.cpp`. More refactoring could be done, such as removing `classLoaderExt.cpp`, in a future RFE. > > Passed tiers 1 - 5 testing. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: rename classes and add vm_exit_during_initialization call ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23476/files - new: https://git.openjdk.org/jdk/pull/23476/files/c2039929..9e4e33dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23476&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23476&range=03-04 Stats: 2603 lines in 15 files changed: 1268 ins; 1261 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/23476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23476/head:pull/23476 PR: https://git.openjdk.org/jdk/pull/23476 From ccheung at openjdk.org Fri Feb 21 06:19:55 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 21 Feb 2025 06:19:55 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v4] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 00:35:51 GMT, Vladimir Kozlov wrote: > Passing by comment. We touched it on recent Leyden meeting. The name "AOTCodeSource" is very confusing. Especially when we start caching AOT compiled code. Can we rename it to avoid confusion? Per our discussions, I've renamed "AOTCodeSource" to "AOTClassLocation". I also renamed the related classes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23476#issuecomment-2673567780 From ccheung at openjdk.org Fri Feb 21 06:19:56 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 21 Feb 2025 06:19:56 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v5] In-Reply-To: References: <_9GFraN7--YUC8esAB28iHzdRC7eYJok355TpDH7Df8=.30548f08-454f-47ba-83d5-a4feabaee9ff@github.com> Message-ID: On Thu, 20 Feb 2025 07:29:07 GMT, David Holmes wrote: >> How about adding the vm_exit in `ClassLoaderDataShared::ensure_module_entry_table_exist()` instead of assert? >> >> >> void ClassLoaderDataShared::ensure_module_entry_table_exist(oop class_loader) { >> Handle h_loader(JavaThread::current(), class_loader); >> ModuleEntryTable* met = Modules::get_module_entry_table(h_loader); >> if (met == nullptr) { >> vm_exit_during_initialization("ClassLoaderDataShared::ensure_module_entry_table_exist() failed unexpectedly"); >> } >> } > > I can't answer that. As a refactoring I expect to see the current behaviour preserved. After discussion with Ioi, we found a place to call the vm_exit function: void AOTClassLocationConfig::dumptime_init(JavaThread* current) { assert(CDSConfig::is_dumping_archive(), ""); _dumptime_instance = NEW_C_HEAP_OBJ(AOTClassLocationConfig, mtClassShared); _dumptime_instance->dumptime_init_helper(current); if (current->has_pending_exception()) { // we can get an exception only when we run out of metaspace, but that // shouldn't happen this early in bootstrap. java_lang_Throwable::print(current->pending_exception(), tty); vm_exit_during_initialization("AOTClassLocationConfig::dumptime_init_helper() failed unexpectedly"); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1964907863 From epeter at openjdk.org Fri Feb 21 07:04:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Feb 2025 07:04:57 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 16:14:09 GMT, Vladimir Kozlov wrote: >> @vnkozlov I suggest that I change the probability to something quite low now, just to make sure that the fast-loop is placed nicely. When I do the experiments for aliasing-analysis runtime-checks, then I will be able to benchmark much better for both cases, since it is much easier to create many different cases. At that point, I could still adapt the probabilities to a different constant. Or maybe I can somehow adjust the probabilities in the chain such that they are balanced. Like if there is 1 condition, give it `0.5`, if there are 2 give them each `sqrt(0.5)`, if there are `n` then `pow(0.5, 1/n)`, so that once you multiply them you get `pow(pow(0.5, 1/n),n) = 0.5`. We could also set another "target" probability than `0.5`. The issue is that experimenting now is a little difficult, because I only have the alignment-checks to play with, which are really really rare to fail in the "real world", I think. But aliasing-checks are more likely to fail, so there could be more interesti ng benchmark results there. >> >> Does that sound ok? >> >>> Can we profile alignment in Interpreter (and C1)? >> >> It would be nice if we could profile alignment or aliasing. Maybe that is possible. But I suppose there are always cases where profiling is not available (Xcomp ?), and we should have reasonable defaults there. We could investigate profiling in a second step, to improve things if we think that is worth it. Profiling these things would also be additional complexity - I'm not convinced yet it is worth it. >> >> What do you think? > >> > Can we profile alignment in Interpreter (and C1)? >> >> It would be nice if we could profile alignment or aliasing. Maybe that is possible. But I suppose there are always cases where profiling is not available (Xcomp ?), and we should have reasonable defaults there. We could investigate profiling in a second step, to improve things if we think that is worth it. Profiling these things would also be additional complexity - I'm not convinced yet it is worth it. >> >> What do you think? > > You should not worry about `-Xcomp` it is testing flag - we can use some default there. > I am fine if you think profiling will not bring us much benefits. Note, I am not asking create counters - just a bit to indicate if we had unaligned access to native memory in a method. In such case we may skip predicate and generate multi versions loop during compilation. On other hand, we may have unaligned access only during startup and not later when we compile method. Anyway, it does not affect these changes. > > I will look on changes more later. @vnkozlov I made the change with the probability `PROB_FAIR` -> `PROB_LIKELY_MAG(3)` and ran testing again. @rwestrel Do you want me to find examples for the pre-loop disappearing, I suppose I can find some easily by adding an assert in SuperWord, where we bail out, as I showed above. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2673745463 From epeter at openjdk.org Fri Feb 21 08:22:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Feb 2025 08:22:59 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> <5oGMaD5b87inAMkco6l5ODRvWv7FRsHGJiu_UMrGrTc=.0be44429-d322-4a6f-b91d-b64a146fad05@github.com> <3ArmrOQcUoj8DhHTq1a40Oz3GE8bCDDy3FF eVgbladg=.b8e0e13b-39f3-41a6-8a1b-5ca4febb4a41@github.com> Message-ID: On Thu, 20 Feb 2025 11:00:59 GMT, Galder Zamarre?o wrote: > So, if those int scalar regressions were not a problem when int min/max intrinsic was added, I would expect the same to apply to long. Do you know when they were added? If that was a long time ago, we might not have noticed back then, but we might notice now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2673875104 From epeter at openjdk.org Fri Feb 21 08:23:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Feb 2025 08:23:00 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Thu, 20 Feb 2025 06:50:07 GMT, Galder Zamarre?o wrote: > The interesting thing is intReductionSimpleMin @ 100%. We see a regression there but I didn't observe it with the perfasm run. So, this could be due to variance in the application of cmov or not? I don't see the error / variance in the results you posted. Often I look at those, and if it is anywhere above 10% of the average, then I'm suspicious ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2673879859 From epeter at openjdk.org Fri Feb 21 08:30:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 21 Feb 2025 08:30:00 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <6-Fgj-Lrd7GSpR0ZAi8YFlOZB12hCBB6p3oGZ1xodvA=.1ce2fa12-daff-4459-8fb8-1052acaf5639@github.com> <5oGMaD5b87inAMkco6l5ODRvWv7FRsHGJiu_UMrGrTc=.0be44429-d322-4a6f-b91d-b64a146fad05@github.com> <3ArmrOQcUoj8DhHTq1a40Oz3GE8bCDDy3FF eVgbladg=.b8e0e13b-39f3-41a6-8a1b-5ca4febb4a41@github.com> Message-ID: On Thu, 20 Feb 2025 11:00:59 GMT, Galder Zamarre?o wrote: > Re: https://github.com/openjdk/jdk/pull/20098#issuecomment-2671144644 - I was trying to think what could be causing this. Maybe it is an issue with probabilities? Do you know at what point (if at all) the `MinI` node appears/disappears in that example? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2673892612 From dchuyko at openjdk.org Fri Feb 21 08:43:30 2025 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Fri, 21 Feb 2025 08:43:30 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 [v2] In-Reply-To: References: Message-ID: > The location for rfp should be set in in the register map. In particular, it wasn't set in frame::sender_for_interpreter_frame() if neither C2 nor JVMCI were included. > > COMPILER1_OR_COMPILER2 condition is used instead of COMPILER2_OR_JVMCI, which also covers INCLUDE_JVMCI case. Dmitry Chuyko has updated the pull request incrementally with one additional commit since the last revision: Full #if condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23682/files - new: https://git.openjdk.org/jdk/pull/23682/files/8c273575..d157893c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23682&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23682&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23682/head:pull/23682 PR: https://git.openjdk.org/jdk/pull/23682 From dchuyko at openjdk.org Fri Feb 21 08:50:54 2025 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Fri, 21 Feb 2025 08:50:54 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 [v2] In-Reply-To: References: Message-ID: <-3LUunUZxEf0ybgUGLnprYt4T3QZpC7afFsyrRwSwKQ=.fbfa33a9-e16c-485a-b8d1-5a19d2bde57c@github.com> On Fri, 21 Feb 2025 08:43:30 GMT, Dmitry Chuyko wrote: >> The location for rfp should be set in in the register map. In particular, it wasn't set in frame::sender_for_interpreter_frame() if neither C2 nor JVMCI were included. >> >> COMPILER1_OR_COMPILER2 condition is used instead of COMPILER2_OR_JVMCI, which also covers INCLUDE_JVMCI case. > > Dmitry Chuyko has updated the pull request incrementally with one additional commit since the last revision: > > Full #if condition OK, changed to a full condition. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23682#issuecomment-2673946926 From duke at openjdk.org Fri Feb 21 10:09:56 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Fri, 21 Feb 2025 10:09:56 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: <3kiI1J7jcczgzTRi9HZztzhGe1blcy8Ga11xoGhzueY=.98543172-5b38-4199-bead-0988de0e0e75@github.com> References: <3kiI1J7jcczgzTRi9HZztzhGe1blcy8Ga11xoGhzueY=.98543172-5b38-4199-bead-0988de0e0e75@github.com> Message-ID: On Tue, 18 Feb 2025 13:33:52 GMT, Andrew Dinn wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding comments + some code reorganization > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2594: > >> 2592: guarantee(T != T1Q && T != T1D, "incorrect arrangement"); \ >> 2593: if (!acceptT2D) guarantee(T != T2D, "incorrect arrangement"); \ >> 2594: if (strcmp(#NAME, "sqdmulh") == 0) guarantee(T != T8B && T != T16B, "incorrect arrangement"); \ > > Suggestion: > > I think it might be better to change this test from a strcmp call to (opc2 == 0b101101). The strcmp test is clearer to a reader of the code but the call may not be guaranteed to be compiled out at build time while the latter will. Changed as suggested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1965215153 From duke at openjdk.org Fri Feb 21 10:14:00 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Fri, 21 Feb 2025 10:14:00 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 13:43:18 GMT, Andrew Dinn wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding comments + some code reorganization > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4066: > >> 4064: } >> 4065: >> 4066: // Execute on round of keccak of two computations in parallel. > > Suggestion: > > It would be helpful to add comments that relate the register and instruction selection to the original Java source code. e.g. change the header as follows > > // Performs 2 keccak round transformations using vector parallelism > // > // Two sets of 25 * 64-bit input states a0[lo:hi]...a24[lo:hi] are passed in > // the lower/upper halves of registers v0...v24 and the transformed states > // are returned in the same registers. Intermediate 64-bit pairs > // c0...c5 and d0...d5 are computed in registers v25...v30. v31 is > // loaded with the required pair of 64 bit rounding constants. > // During computation of the output states some intermediate results are > // shuffled around registers v0...v30. Comments on each line indicate > // how the values in registers correspond to variables ai, ci, di in > // the Java source code, likewise how the generated machine instructions > // correspond to Java source operations (n.b. rol means rotate left). > > The annotate the generation steps as follows: > > __ eor3(v29, __ T16B, v4, v9, v14); // c4 = a4 ^ a9 ^ a14 > __ eor3(v26, __ T16B, v1, v6, v11); // c1 = a1 ^ a16 ^ a11 > __ eor3(v28, __ T16B, v3, v8, v13); // c3 = a3 ^ a8 ^a13 > __ eor3(v25, __ T16B, v0, v5, v10); // c0 = a0 ^ a5 ^ a10 > __ eor3(v27, __ T16B, v2, v7, v12); // c2 = a2 ^ a7 ^ a12 > __ eor3(v29, __ T16B, v29, v19, v24); // c4 ^= a19 ^ a24 > __ eor3(v26, __ T16B, v26, v16, v21); // c1 ^= a16 ^ a21 > __ eor3(v28, __ T16B, v28, v18, v23); // c3 ^= a18 ^ a23 > __ eor3(v25, __ T16B, v25, v15, v20); // c0 ^= a15 ^ a20 > __ eor3(v27, __ T16B, v27, v17, v22); // c2 ^= a17 ^ a22 > > __ rax1(v30, __ T2D, v29, v26); // d0 = c4 ^ rol(c1, 1) > __ rax1(v26, __ T2D, v26, v28); // d2 = c1 ^ rol(c3, 1) > __ rax1(v28, __ T2D, v28, v25); // d4 = c3 ^ rol(c0, 1) > __ rax1(v25, __ T2D, v25, v27); // d1 = c0 ^ rol(c2, 1) > __ rax1(v27, __ T2D, v27, v29); // d3 = c2 ^ rol(c4, 1) > > __ eor(v0, __ T16B, v0, v30); // a0 = a0 ^ d0 > __ xar(v29, __ T2D, v1, v25, (64 - 1)); // a10' = rol((a1^d1), 1) > __ xar(v1, __ T2D, v6, v25, (64 - 44)); // a1 = rol(a6^d1), 44) > __ xar(v6, __ T2D, v9, v28, (64 - 20)); // a6 = rol((a9^d4), 20) > __ xar(v... Although this piece of code is not new, and I don't really think that this level of commenting is necessary, especially in code that is very unlikely to change, I added the comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1965220606 From azafari at openjdk.org Fri Feb 21 10:21:09 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 21 Feb 2025 10:21:09 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v21] In-Reply-To: <6uq32Tm6oCiyWlXYvmquDd3wcCdruX1TGH6XWMrvgVM=.5add9458-0746-42c8-8b2b-4a0aaf8f5ee6@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <6uq32Tm6oCiyWlXYvmquDd3wcCdruX1TGH6XWMrvgVM=.5add9458-0746-42c8-8b2b-4a0aaf8f5ee6@github.com> Message-ID: On Mon, 10 Feb 2025 14:05:37 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed merge problems > > src/hotspot/share/nmt/mallocTracker.hpp line 163: > >> 161: } >> 162: >> 163: inline const MallocMemory* by_tag(MemTag mem_tag) const { > > Move out into mainline PR Done > src/hotspot/share/nmt/mallocTracker.hpp line 241: > >> 239: >> 240: static inline void record_arena_size_change(ssize_t size, MemTag mem_tag) { >> 241: as_snapshot()->by_tag(mem_tag)->record_arena_size_change(size); > > Move out into mainline PR Done. > src/hotspot/share/nmt/mallocTracker.inline.hpp line 55: > >> 53: l = MallocLimitHandler::category_limit(mem_tag); >> 54: if (l->sz > 0) { >> 55: const MallocMemory* mm = as_snapshot()->by_tag(mem_tag); > > Move out into mainline PR Done. > src/hotspot/share/nmt/memBaseline.cpp line 64: > >> 62: >> 63: // Sort into allocation site addresses and memory tag order for baseline comparison >> 64: int compare_malloc_site_and_tag(const MallocSite& s1, const MallocSite& s2) { > > Move out into mainline PR Done > src/hotspot/share/nmt/memBaseline.cpp line 235: > >> 233: break; >> 234: case by_site_and_tag: >> 235: malloc_sites_to_allocation_site_and_tag_order(); > > Move out into mainline PR Done. > src/hotspot/share/nmt/memBaseline.cpp line 275: > >> 273: >> 274: void MemBaseline::malloc_sites_to_allocation_site_order() { >> 275: if (_malloc_sites_order != by_site && _malloc_sites_order != by_site_and_tag) { > > Move out into mainline PR Done. > src/hotspot/share/nmt/memBaseline.cpp line 292: > >> 290: _malloc_sites.set_head(tmp.head()); >> 291: tmp.set_head(nullptr); >> 292: _malloc_sites_order = by_site_and_tag; > > Move out into mainline PR Done > src/hotspot/share/nmt/memBaseline.hpp line 56: > >> 54: by_size, // by memory size >> 55: by_site, // by call site where the memory is allocated from >> 56: by_site_and_tag // by call site and memory tag > > Move out into mainline PR (and indentation of comment seems wrong) Done. > src/hotspot/share/nmt/memBaseline.hpp line 154: > >> 152: VirtualMemory* virtual_memory(MemTag mem_tag) { >> 153: assert(baseline_type() != Not_baselined, "Not yet baselined"); >> 154: return _virtual_memory_snapshot.by_tag(mem_tag); > > Move out into mainline PR Done. > src/hotspot/share/nmt/memBaseline.hpp line 207: > >> 205: void malloc_sites_to_allocation_site_order(); >> 206: // Sort allocation sites in call site address and memory tag order >> 207: void malloc_sites_to_allocation_site_and_tag_order(); > > Move out into mainline PR Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965223803 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965222796 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965224620 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965225380 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965226658 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965227704 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965228209 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965229789 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965229277 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965228847 From cnorrbin at openjdk.org Fri Feb 21 10:23:55 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Fri, 21 Feb 2025 10:23:55 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow In-Reply-To: References: Message-ID: <8n5a0bIh4JD2bNMDJyyY8R8OSf0QshswTuB-LBuHQdM=.bfd5cd13-99df-4bc4-a889-886040470f27@github.com> On Fri, 21 Feb 2025 00:15:09 GMT, Dean Long wrote: > Can you explain what was wrong with the original fix? The BACKOUT only mentions that tests failed, but doesn't say why. Also, I fail to see why align_up_or_min is an improvement. It seems to silently mask errors, and the callers are not checking the result. Having a size the overflows size_t seems like an error to me. The original fix failed because of tests where overflow was the expected result. In the files changed here, it was either possible to recover from the overflow, or the caller does their own error checking. In both cases, the caller relied on the previous behaviour from `align_up`, and do check the result from `align_up_or_min`/`align_up_or_null`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23711#issuecomment-2674152486 From duke at openjdk.org Fri Feb 21 10:25:59 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Fri, 21 Feb 2025 10:25:59 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 02:55:18 GMT, Hao Sun wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding comments + some code reorganization > > Hi. Here is the test result of our CI. > > ### copyright year > > the following files should update the copyright year to 2025. > > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp > src/hotspot/cpu/aarch64/stubRoutines_aarch64.hpp > src/hotspot/share/runtime/globals.hpp > src/java.base/share/classes/sun/security/provider/ML_DSA.java > src/java.base/share/classes/sun/security/provider/SHA3Parallel.java > test/micro/org/openjdk/bench/java/security/MLDSA.java > > > ### cross-build failure > > Cross build for riscv64/s390/ppc64 failed. > > Here shows the error msg for ppc64 > > > === Output from failing command(s) repeated here === > * For target support_interim-jmods_support__create_java.base.jmod_exec: > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/tmp/jdk-src/src/hotspot/share/asm/codeBuffer.hpp:200), pid=72752, tid=72769 > # assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000e85cb03dc620 <= 0x0000e85cb03e8ab4 <= 0x0000e85cb03e8ab0 > # > # JRE version: OpenJDK Runtime Environment (25.0) (fastdebug build 25-internal-git-1e01c6deec3) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 25-internal-git-1e01c6deec3, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > # Problematic frame: > # V [libjvm.so+0x3b391c] Instruction_aarch64::~Instruction_aarch64()+0xbc > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /tmp/ci-scripts/jdk-src/make/ > # > # An error report file with more information is saved as: > # /tmp/jdk-src/make/hs_err_pid72752.log > ... (rest of output omitted) > > * All command lines available in /sysroot/ppc64el/tmp/build-ppc64el/make-support/failure-logs. > === End of repeated output === > > > I suppose we should make the similar update at file `src/hotspot/cpu/aarch64/stubDeclarations_aarch64.hpp` to other platforms @shqking, I changed the copyright years, but I don't really understand how the aarch64-specific code can overflow buffers on other architectures. As far as I understand, Instruction_aarch64 should not have been there in a ppc build. Was this a build attempted on an aarch64 for the other architectures? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2674156680 From azafari at openjdk.org Fri Feb 21 10:30:18 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 21 Feb 2025 10:30:18 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v21] In-Reply-To: <6uq32Tm6oCiyWlXYvmquDd3wcCdruX1TGH6XWMrvgVM=.5add9458-0746-42c8-8b2b-4a0aaf8f5ee6@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <6uq32Tm6oCiyWlXYvmquDd3wcCdruX1TGH6XWMrvgVM=.5add9458-0746-42c8-8b2b-4a0aaf8f5ee6@github.com> Message-ID: On Mon, 10 Feb 2025 14:03:21 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed merge problems > > src/hotspot/share/nmt/memReporter.cpp line 201: > >> 199: if (mem_tag == mtThread) { >> 200: const VirtualMemory* thread_stack_usage = >> 201: (const VirtualMemory*)_vm_snapshot->by_tag(mtThreadStack); > > Move out into mainline PR Done. > src/hotspot/share/nmt/memReporter.cpp line 243: > >> 241: } else if (mem_tag == mtThread) { >> 242: const VirtualMemory* thread_stack_usage = >> 243: _vm_snapshot->by_tag(mtThreadStack); > > Move out into mainline PR Done. > src/hotspot/share/nmt/memReporter.cpp line 810: > >> 808: void MemDetailDiffReporter::diff_malloc_sites() const { >> 809: MallocSiteIterator early_itr = _early_baseline.malloc_sites(MemBaseline::by_site_and_tag); >> 810: MallocSiteIterator current_itr = _current_baseline.malloc_sites(MemBaseline::by_site_and_tag); > > Move out into mainline PR Done. > src/hotspot/share/nmt/memoryFileTracker.cpp line 47: > >> 45: VMATree::SummaryDiff diff = file->_tree.commit_mapping(offset, size, regiondata); >> 46: for (int i = 0; i < mt_number_of_tags; i++) { >> 47: VirtualMemory* summary = file->_summary.by_tag(NMTUtil::index_to_tag(i)); > > Move out into mainline PR Done. > src/hotspot/share/nmt/memoryFileTracker.cpp line 56: > >> 54: VMATree::SummaryDiff diff = file->_tree.release_mapping(offset, size); >> 55: for (int i = 0; i < mt_number_of_tags; i++) { >> 56: VirtualMemory* summary = file->_summary.by_tag(NMTUtil::index_to_tag(i)); > > Move out into mainline PR Done. > src/hotspot/share/nmt/memoryFileTracker.cpp line 187: > >> 185: snap->commit_memory(current->committed()); >> 186: } >> 187: } > > Revert this change Done. > src/hotspot/share/nmt/memoryFileTracker.hpp line 81: > >> 79: const MemoryFile* file = _files.at(d); >> 80: for (int i = 0; i < mt_number_of_tags; i++) { >> 81: f(NMTUtil::index_to_tag(i), file->_summary.by_tag(NMTUtil::index_to_tag(i))); > > Move out into mainline PR Done. > src/hotspot/share/nmt/nmtCommon.cpp line 33: > >> 31: >> 32: #define MEMORY_TAG_DECLARE_NAME(tag, human_readable) \ >> 33: { #tag, human_readable }, > > Move out into mainline PR Done. > src/hotspot/share/nmt/nmtCommon.hpp line 91: > >> 89: // Map memory tag to index >> 90: static inline int tag_to_index(MemTag mem_tag) { >> 91: assert(tag_is_valid(mem_tag), "Invalid tag (%u)", (unsigned)mem_tag); > > Move out into mainline PR Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965234821 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965236095 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965237891 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965238649 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965239439 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965240691 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965241348 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965242071 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965242582 From rrich at openjdk.org Fri Feb 21 10:31:58 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 21 Feb 2025 10:31:58 GMT Subject: RFR: 8350182: [s390x] Relativize locals in interpreter frames In-Reply-To: References: Message-ID: On Mon, 17 Feb 2025 09:53:37 GMT, Amit Kumar wrote: > Port for [JDK-8299795](https://bugs.openjdk.org/browse/JDK-8299795) Relativize Z_locals in interpreter frame for s390x. > > Tier1 test with fastdebug vm are clean. Looks good to me. Cheers, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23660#pullrequestreview-2632639194 From jsjolen at openjdk.org Fri Feb 21 10:39:04 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 21 Feb 2025 10:39:04 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v21] In-Reply-To: <1AtAN_70cbiU2-KRyPK90QnwMedZxIsZ22KgBwioyOc=.e415931f-34cd-4157-9cc6-08b95d89efa2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <1AtAN_70cbiU2-KRyPK90QnwMedZxIsZ22KgBwioyOc=.e415931f-34cd-4157-9cc6-08b95d89efa2@github.com> Message-ID: On Fri, 7 Feb 2025 10:31:36 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed merge problems > > src/hotspot/share/opto/stringopts.cpp line 173: > >> 171: } >> 172: void add_control(Node* ctrl) { >> 173: assert(!_control.contains(ctrl), "only push once"); > > Remove the changes in this file. @afshin-zafari , here's one of the unhandled comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965257792 From azafari at openjdk.org Fri Feb 21 11:09:48 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 21 Feb 2025 11:09:48 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v24] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. > - Adding a runtime flag for selecting the old or new version can be added later. > - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: missed flag/tag -> type ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20425/files - new: https://git.openjdk.org/jdk/pull/20425/files/611a2d4f..7640108d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=22-23 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From azafari at openjdk.org Fri Feb 21 11:09:50 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 21 Feb 2025 11:09:50 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v21] In-Reply-To: <6uq32Tm6oCiyWlXYvmquDd3wcCdruX1TGH6XWMrvgVM=.5add9458-0746-42c8-8b2b-4a0aaf8f5ee6@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <6uq32Tm6oCiyWlXYvmquDd3wcCdruX1TGH6XWMrvgVM=.5add9458-0746-42c8-8b2b-4a0aaf8f5ee6@github.com> Message-ID: On Mon, 10 Feb 2025 14:03:30 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed merge problems > > src/hotspot/share/nmt/memReporter.cpp line 192: > >> 190: } >> 191: >> 192: void MemSummaryReporter::report_summary_of_tag(MemTag mem_tag, > > Move out into mainline PR Fixed in commit "missed flag/tag -> type" (7640108). > src/hotspot/share/nmt/memReporter.cpp line 538: > >> 536: // thread stack is reported as part of thread category >> 537: if (mem_tag == mtThreadStack) continue; >> 538: diff_summary_of_tag(mem_tag, > > Move out into mainline PR Fixed in commit "missed flag/tag -> type" (7640108). > src/hotspot/share/nmt/memReporter.cpp line 608: > >> 606: >> 607: >> 608: void MemSummaryDiffReporter::diff_summary_of_tag(MemTag mem_tag, > > Move out into mainline PR Fixed in commit "missed flag/tag -> type" (7640108). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965299486 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965299728 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965299979 From yzheng at openjdk.org Fri Feb 21 12:14:57 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 21 Feb 2025 12:14:57 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v5] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 20:19:15 GMT, Coleen Phillimore wrote: >> Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. >> Tested with tier1-4 and performance tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace LGTM! As @iwanowww said, not inlining such trivial methods seems more like an inliner bug/enhancement opportunity. ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/23572#pullrequestreview-2632877796 From coleenp at openjdk.org Fri Feb 21 12:31:46 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 21 Feb 2025 12:31:46 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v6] In-Reply-To: References: Message-ID: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Remove JVM_GetClassModifiers from jvm.h too. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23572/files - new: https://git.openjdk.org/jdk/pull/23572/files/02347433..c23718b3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23572&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23572&range=04-05 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23572.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23572/head:pull/23572 PR: https://git.openjdk.org/jdk/pull/23572 From coleenp at openjdk.org Fri Feb 21 12:31:48 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 21 Feb 2025 12:31:48 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v6] In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 14:21:47 GMT, Coleen Phillimore wrote: >> src/hotspot/share/prims/jvm.cpp line 1262: >> >>> 1260: JVM_END >>> 1261: >>> 1262: JVM_ENTRY(jboolean, JVM_IsArrayClass(JNIEnv *env, jclass cls)) >> >> Where are the changes to jvm.h? > > Good catch, I also removed JVM_GetProtectionDomain. and JVM_GetClassModifiers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1965401052 From coleenp at openjdk.org Fri Feb 21 12:31:49 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 21 Feb 2025 12:31:49 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v5] In-Reply-To: References: Message-ID: <3jNPEzaXa0Ncf8eu3vct6a_jyH7k4tH_mbRBaKmbMc0=.d3a86a0f-1bed-4084-af92-959f4dbd52f4@github.com> On Thu, 20 Feb 2025 23:38:34 GMT, Chen Liang wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix whitespace > > You are right, using the field directly is indeed better. I don't use the field directly because the field is a short and getModifiers makes it into Modifier. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1965399996 From liach at openjdk.org Fri Feb 21 14:04:02 2025 From: liach at openjdk.org (Chen Liang) Date: Fri, 21 Feb 2025 14:04:02 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v5] In-Reply-To: <3jNPEzaXa0Ncf8eu3vct6a_jyH7k4tH_mbRBaKmbMc0=.d3a86a0f-1bed-4084-af92-959f4dbd52f4@github.com> References: <3jNPEzaXa0Ncf8eu3vct6a_jyH7k4tH_mbRBaKmbMc0=.d3a86a0f-1bed-4084-af92-959f4dbd52f4@github.com> Message-ID: On Fri, 21 Feb 2025 12:27:56 GMT, Coleen Phillimore wrote: >> You are right, using the field directly is indeed better. > > I don't use the field directly because the field is a short and getModifiers makes it into Modifier. Indeed, even though this checks for the specific bit so widening has no effect, it is better to be cautious here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1965522767 From aph at openjdk.org Fri Feb 21 14:24:57 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 21 Feb 2025 14:24:57 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v9] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 11:43:08 GMT, Amit Kumar wrote: >> s390x implementation for Class.isInstance intrinsic. >> >> Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. >> >> Benchmark results will be updated soon. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > no need of Z_R0_scratch src/hotspot/cpu/s390/c1_Runtime1_s390.cpp line 635: > 633: __ z_bcr(Assembler::bcondEqual, Z_R14); > 634: > 635: // Z_R10 and Z_R11 are caller saved, so we must need to preserve them before any use Suggestion: // Z_R10 and Z_R11 are callee saved, so we must need to preserve them before any use We are the callee! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23535#discussion_r1965554704 From azafari at openjdk.org Fri Feb 21 15:08:48 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 21 Feb 2025 15:08:48 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v25] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. > - Adding a runtime flag for selecting the old or new version can be added later. > - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. Afshin Zafari has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 70 commits: - Merge remote-tracking branch 'origin/master' into _8337217_nmt_VMT_with_tree - missed flag/tag -> type - flag/type -> tag chages are removed. - removed vmtCommon.hpp - fixed merge problems - fix in shendoahCardTable - Merge remote-tracking branch 'origin/master' into _8337217_nmt_VMT_with_tree - merge with the new lock mechanism for NMT - merge with master - one small fix for SSIZE_FORMAT - ... and 60 more: https://git.openjdk.org/jdk/compare/dfcd0df6...92f2bfdd ------------- Changes: https://git.openjdk.org/jdk/pull/20425/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=24 Stats: 2172 lines in 40 files changed: 762 ins; 1152 del; 258 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From mdoerr at openjdk.org Fri Feb 21 15:25:57 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 21 Feb 2025 15:25:57 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v26] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Thu, 20 Feb 2025 15:41:12 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: > > - change branch and remove not needed variables > - change branch and remove not needed variables src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 686: > 684: __ bind(L_aligned_loop); > 685: __ lvx(vH, temp1, data); > 686: __ vec_perm(vH, vH, vH, loadOrder); I think this instruction is only needed on Big Endian and can be optimized out on Little Endian (like in https://github.com/openjdk/jdk/blob/dfcd0df60c60cf89dc01682264a573ad39e61a17/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp#L4155). src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 703: > 701: __ vec_perm(vTmp4, vHigh, vHigh, loadOrder); > 702: __ vec_perm(vTmp5, vLow, vLow, loadOrder); > 703: __ vec_perm(vH, vTmp5, vTmp4, vPerm); Can we compute a different vPerm such that we only need one `vec_perm` instruction in the loop? src/hotspot/cpu/ppc/vm_version_ppc.cpp line 314: > 312: } else if (UseGHASHIntrinsics) { > 313: if (!FLAG_IS_DEFAULT(UseGHASHIntrinsics)) > 314: warning("GHASH intrinsics are not available on this CPU"); Coding style: hotspot uses curly braces (like above). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1965667282 PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1965675196 PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1965672939 From azafari at openjdk.org Fri Feb 21 15:29:45 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 21 Feb 2025 15:29:45 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v26] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: <1c6Byr-6ht6vBjJnMPon588Sq5QBH2dbdsYDDVXbwEA=.489a7e1b-185b-4e78-8ac3-03af005f800b@github.com> > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. > - Adding a runtime flag for selecting the old or new version can be added later. > - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: undo stringopts.cpp changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20425/files - new: https://git.openjdk.org/jdk/pull/20425/files/92f2bfdd..18ec1db5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=24-25 Stats: 143 lines in 1 file changed: 44 ins; 78 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From azafari at openjdk.org Fri Feb 21 15:29:47 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 21 Feb 2025 15:29:47 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v21] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <1AtAN_70cbiU2-KRyPK90QnwMedZxIsZ22KgBwioyOc=.e415931f-34cd-4157-9cc6-08b95d89efa2@github.com> Message-ID: On Fri, 21 Feb 2025 10:36:23 GMT, Johan Sj?len wrote: >> src/hotspot/share/opto/stringopts.cpp line 173: >> >>> 171: } >>> 172: void add_control(Node* ctrl) { >>> 173: assert(!_control.contains(ctrl), "only push once"); >> >> Remove the changes in this file. > > @afshin-zafari , here's one of the unhandled comments. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1965680516 From mdoerr at openjdk.org Fri Feb 21 16:06:57 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 21 Feb 2025 16:06:57 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v26] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Thu, 20 Feb 2025 15:41:12 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: > > - change branch and remove not needed variables > - change branch and remove not needed variables I'm getting test error on AIX: TestAESMain algorithm=AES, mode=GCM, paddingStr=nopadding, msgSize=646, keySize=128, noReinit=false, checkOutput=false, encInputOffset=0, encOutputOffset=0, decOutputOffset=0, lastChunkSize=32 Algorithm: AES(128bit) Decryption cipher provider: SunJCE version 25 Decryption cipher algorithm: AES/GCM/nopadding javax.crypto.AEADBadTagException: Tag mismatch at java.base/com.sun.crypto.provider.GaloisCounterMode$GCMDecrypt.doFinal(GaloisCounterMode.java:1504) at java.base/com.sun.crypto.provider.GaloisCounterMode.engineDoFinal(GaloisCounterMode.java:427) at java.base/javax.crypto.Cipher.doFinal(Cipher.java:2478) at TestAESBase.prepare(TestAESBase.java:158) at TestAESMain.main(TestAESMain.java:154) The same test works on linuxppc64le. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20235#issuecomment-2674939633 From sroy at openjdk.org Fri Feb 21 16:11:56 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 21 Feb 2025 16:11:56 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v26] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Fri, 21 Feb 2025 15:23:11 GMT, Martin Doerr wrote: >> Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: >> >> - change branch and remove not needed variables >> - change branch and remove not needed variables > > src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 703: > >> 701: __ vec_perm(vTmp4, vHigh, vHigh, loadOrder); >> 702: __ vec_perm(vTmp5, vLow, vLow, loadOrder); >> 703: __ vec_perm(vH, vTmp5, vTmp4, vPerm); > > Can we compute a different vPerm such that we only need one `vec_perm` instruction in the loop? As per the algorithm mentioned section 6.4 in Power ISA, we need 2 loads to access from nearest aligned address(to the unaligned address) and the next aligned address. Without the vec_perm, I faced the issue of wrong control vectors generated due to Endianness. Hence I had to include them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1965777969 From mdoerr at openjdk.org Fri Feb 21 16:24:57 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 21 Feb 2025 16:24:57 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v26] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Fri, 21 Feb 2025 16:08:09 GMT, Suchismith Roy wrote: >> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 703: >> >>> 701: __ vec_perm(vTmp4, vHigh, vHigh, loadOrder); >>> 702: __ vec_perm(vTmp5, vLow, vLow, loadOrder); >>> 703: __ vec_perm(vH, vTmp5, vTmp4, vPerm); >> >> Can we compute a different vPerm such that we only need one `vec_perm` instruction in the loop? > > As per the algorithm mentioned section 6.4 in Power ISA, we need 2 loads to access from nearest aligned address(to the unaligned address) and the next aligned address. > > Without the vec_perm, I faced the issue of wrong control vectors generated due to Endianness. Hence I had to include them. Please understand my question correctly. I didn't propose to remove all `ver_perm`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1965808162 From sroy at openjdk.org Fri Feb 21 17:17:58 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 21 Feb 2025 17:17:58 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v26] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: <7qzgn1LeDY8CaNJZVRPb0FORbKbkfBP85qrU3MSH_Io=.62e89bae-a5e2-47e3-8217-d83ab7bef00f@github.com> On Fri, 21 Feb 2025 16:21:49 GMT, Martin Doerr wrote: >> As per the algorithm mentioned section 6.4 in Power ISA, we need 2 loads to access from nearest aligned address(to the unaligned address) and the next aligned address. >> >> Without the vec_perm, I faced the issue of wrong control vectors generated due to Endianness. Hence I had to include them. > > Please understand my question correctly. I didn't propose to remove all `ver_perm`. The idea is to do the same job in the loop with one `vec_perm`. Hi @TheRealMDoerr Maybe my answer was not clear. I am not proposing to remove them. I am unable to decipher how to reduce the 3 instructions to one, as I feel the below 2 lines are required , as per the algorithm. __ vec_perm(vTmp4, vHigh, vHigh, loadOrder); __ vec_perm(vTmp5, vLow, vLow, loadOrder); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1965901600 From kvn at openjdk.org Fri Feb 21 18:31:59 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Feb 2025 18:31:59 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v5] In-Reply-To: References: Message-ID: On Fri, 21 Feb 2025 06:19:55 GMT, Calvin Cheung wrote: >> This changset refactors CDS class paths and module paths validation code into a new class `AOTCodeSource` and related class `AOTCodeSourceConfig`. Code has been moved from filemap.[c|h]pp, classLoader.[c|h]pp, and classLoaderExt.[c|h]pp to aotCodeSource.[c|h]pp. CDS dependencies have been removed from `classLoader.cpp`. More refactoring could be done, such as removing `classLoaderExt.cpp`, in a future RFE. >> >> Passed tiers 1 - 5 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > rename classes and add vm_exit_during_initialization call Thank you for renaming classes. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23476#pullrequestreview-2633908080 From stuefe at openjdk.org Fri Feb 21 18:38:20 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 21 Feb 2025 18:38:20 GMT Subject: RFR: 8330174: Protection zone for easier detection of accidental zero-nKlass use [v6] In-Reply-To: References: Message-ID: <9r3wnpF1xUgBKfShzU5BnExbehtuqbSwuvV5bgBXsuo=.1a58a462-3da7-45b9-95ef-7f6a7fce928a@github.com> > If we wrongly decode an nKlass of `0`, and the nKlass encoding base is not NULL (typical for most cases that run with CDS enabled), the resulting pointer points to the start of the Klass encoding range. That area is readable. If CDS is enabled, it will be at the start of the CDS metadata archive. If CDS is off, it is at the start of the class space. > > Now, both CDS and class space allocate a safety buffer at the start to prevent Klass structures from being located there. However, that memory is still readable, so we can read garbage data from that area. In the case of CDS, that area is just 16 bytes, after that come real data. Since Klass is large, most accesses will read beyond the 16-byte zone. > > We should protect the first page in the narrow Klass encoding range to make analysis of errors like this easier. Especially in release builds where decode_not_null does not assert. We already use a similar technique in the heap, and most OSes protect the zero page for the same reason. > > This patch does that. Now, decoding an `0` nKlass and then using the result `Klass` - calling virtual functions or accessing members - crashes right away. > > Additionally, the patch provides a helpful output in the register/stack section, e.g: > > > RDI=0x0000000800000000 points into nKlass protection zone > > > > Testing: > - GHAs. > - I tested the patch manually on x64 Linux for both CDS on, CDS off and zero-based encoding, CDS off and non-zero-based encoding. > - I tested manually on Windows x64 > - I also prepared an automatic gtest, but that needs some preparatory work on the gtest suite first to work (see https://bugs.openjdk.org/browse/JDK-8348029) > > -- Update 2024-01-22 -- > I added a jtreg test that is more thorough than a gtest (also scans the produced hs-err file) Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - copyrights - Redo everything, including Iois changes - reverse everything - Merge branch 'master' into JDK-8330174-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use - fix whitespace error - fix test - test-fixes - fix bug found with jtreg test where metaspace buddy allocator would accidentally replace the protection mapping - add jtreg test; replaces the gtest - Merge branch 'master' into JDK-8330174-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use - ... and 4 more: https://git.openjdk.org/jdk/compare/d34fbeb9...6d85e1fe ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23190/files - new: https://git.openjdk.org/jdk/pull/23190/files/0840cf07..6d85e1fe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23190&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23190&range=04-05 Stats: 58871 lines in 2369 files changed: 31687 ins; 14894 del; 12290 mod Patch: https://git.openjdk.org/jdk/pull/23190.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23190/head:pull/23190 PR: https://git.openjdk.org/jdk/pull/23190 From kvn at openjdk.org Fri Feb 21 19:08:01 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Feb 2025 19:08:01 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v3] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 07:21:45 GMT, Emanuel Peter wrote: >> Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below. >> >> **Background** >> >> With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer. >> >> **Problem** >> >> So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code. >> >> >> MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1); >> MemorySegment nativeUnaligned = nativeAligned.asSlice(1); >> test3(nativeUnaligned); >> >> >> When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not! >> >> static void test3(MemorySegment ms) { >> for (int i = 0; i < RANGE; i++) { >> long adr = i * 4L; >> int v = ms.get(ELEMENT_LAYOUT, adr); >> ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1)); >> } >> } >> >> >> **Solution: Runtime Checks - Predicate and Multiversioning** >> >> Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check. >> >> I came up with 2 options where to place the runtime checks: >> - A new "auto vectorization" Parse Predicate: >> - This only works when predicates are available. >> - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop. >> - Multiversion the loop: >> - Create 2 copies of the loop (fast and slow loops). >> - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take >> - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > adjust selector if probability How profitable (performance wise) to optimize slow path loop? Can we skip any optimizations for it - treat it as not-Counted? src/hotspot/share/opto/loopTransform.cpp line 3363: > 3361: if (cl->is_pre_loop() || cl->is_post_loop()) return true; > 3362: > 3363: // If we are stalled, check if we can get unstalled. Can you expand comment explaining cases when we "stall" and what it means? src/hotspot/share/opto/loopopts.cpp line 4514: > 4512: // and then rejecting the slow_loop by constant folding the multiversion_if. > 4513: // > 4514: // Therefore, we "stall" the optimization of the slow_loop until we add We don't use "stall" term. We use "delay" - this is what happens here if I understand it correctly. src/hotspot/share/opto/loopopts.cpp line 4520: > 4518: // multiversion_if folds away the "stalled" slow_loop. If we add any > 4519: // speculative assumption, then we mark the OpaqueMultiversioningNode > 4520: // with "unstall_slow_loop", so that the slow_loop can be optimized. "unstall_slow_loop" - > "optimize_slow_loop" ------------- PR Review: https://git.openjdk.org/jdk/pull/22016#pullrequestreview-2633960596 PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1966019182 PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1966028103 PR Review Comment: https://git.openjdk.org/jdk/pull/22016#discussion_r1966032230 From mdoerr at openjdk.org Fri Feb 21 19:57:02 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 21 Feb 2025 19:57:02 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v26] In-Reply-To: <7qzgn1LeDY8CaNJZVRPb0FORbKbkfBP85qrU3MSH_Io=.62e89bae-a5e2-47e3-8217-d83ab7bef00f@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> <7qzgn1LeDY8CaNJZVRPb0FORbKbkfBP85qrU3MSH_Io=.62e89bae-a5e2-47e3-8217-d83ab7bef00f@github.com> Message-ID: On Fri, 21 Feb 2025 17:15:34 GMT, Suchismith Roy wrote: >> Please understand my question correctly. I didn't propose to remove all `ver_perm`. The idea is to do the same job in the loop with one `vec_perm`. > > Hi @TheRealMDoerr Maybe my answer was not clear. I am not proposing to remove them. I am unable to decipher how to reduce the 3 instructions to one, as I feel the below 2 lines are required , as per the algorithm. > __ vec_perm(vTmp4, vHigh, vHigh, loadOrder); > __ vec_perm(vTmp5, vLow, vLow, loadOrder); The purpose of the 3 `vec_perm` instructions is to extract 16 Bytes from two 16 Byte values loaded into vector registers. This can be done by 1 `vec_perm` instruction. But I think AIX should get fixed first before we figure out how to determine the vPerm value for that (probably lvsl + vxor before the loop). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1966095694 From gziemski at openjdk.org Fri Feb 21 20:50:41 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 21 Feb 2025 20:50:41 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v51] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: fix a bug in recording free ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/60394ecf..da6d4997 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=50 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=49-50 Stats: 120 lines in 4 files changed: 46 ins; 27 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From dlong at openjdk.org Fri Feb 21 20:53:53 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 21 Feb 2025 20:53:53 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 [v2] In-Reply-To: References: Message-ID: On Fri, 21 Feb 2025 08:43:30 GMT, Dmitry Chuyko wrote: >> The location for rfp should be set in in the register map. In particular, it wasn't set in frame::sender_for_interpreter_frame() if neither C2 nor JVMCI were included. >> >> COMPILER1_OR_COMPILER2 condition is used instead of COMPILER2_OR_JVMCI, which also covers INCLUDE_JVMCI case. > > Dmitry Chuyko has updated the pull request incrementally with one additional commit since the last revision: > > Full #if condition Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23682#pullrequestreview-2634192503 From vpaprotski at openjdk.org Fri Feb 21 20:58:27 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Fri, 21 Feb 2025 20:58:27 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 Message-ID: Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. Before (no AVX512) Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s After (with AVX2) Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s Before (with AVX512): Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 3112.945 ? 12.930 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 3039.183 ? 12.362 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 2248.987 ? 7.427 ops/s After (with AVX512): Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9815.713 ? 23.455 ops/s SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 9136.786 ? 27.747 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 3167.702 ? 13.331 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 3090.053 ? 12.925 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 2278.031 ? 6.971 ops/s ------------- Commit messages: - whitespace - split up ASM and Math changes Changes: https://git.openjdk.org/jdk/pull/23719/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23719&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350459 Stats: 625 lines in 9 files changed: 525 ins; 15 del; 85 mod Patch: https://git.openjdk.org/jdk/pull/23719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23719/head:pull/23719 PR: https://git.openjdk.org/jdk/pull/23719 From kvn at openjdk.org Fri Feb 21 20:58:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Feb 2025 20:58:54 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 [v2] In-Reply-To: References: Message-ID: On Fri, 21 Feb 2025 08:43:30 GMT, Dmitry Chuyko wrote: >> The location for rfp should be set in in the register map. In particular, it wasn't set in frame::sender_for_interpreter_frame() if neither C2 nor JVMCI were included. >> >> COMPILER1_OR_COMPILER2 condition is used instead of COMPILER2_OR_JVMCI, which also covers INCLUDE_JVMCI case. > > Dmitry Chuyko has updated the pull request incrementally with one additional commit since the last revision: > > Full #if condition Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23682#pullrequestreview-2634201847 From dlong at openjdk.org Fri Feb 21 21:02:03 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 21 Feb 2025 21:02:03 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 10:48:26 GMT, Casper Norrbin wrote: > Hi everyone, > > The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. > > The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Additionally, I've created `align_up_or_min`, which behaves similarly to the original align_up but handles overflows predictably across all integer types. This new function is used in the locations where overflow checks already exist, providing a safer alternative. I don't see where we check the return value of align_up_or_min for the changes in src/hotspot/share/gc/shared/gcArguments.cpp. If tests fail because of align_up, maybe the test should be fixed? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23711#issuecomment-2675530338 From dlong at openjdk.org Fri Feb 21 21:10:58 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 21 Feb 2025 21:10:58 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v5] In-Reply-To: References: <3jNPEzaXa0Ncf8eu3vct6a_jyH7k4tH_mbRBaKmbMc0=.d3a86a0f-1bed-4084-af92-959f4dbd52f4@github.com> Message-ID: On Fri, 21 Feb 2025 14:01:20 GMT, Chen Liang wrote: >> I don't use the field directly because the field is a short and getModifiers makes it into Modifier. > > Indeed, even though this checks for the specific bit so widening has no effect, it is better to be cautious here. > I don't use the field directly because the field is a short and getModifiers makes it into Modifier. But getModifiers() returns `int`, not `Modifier` (which is all static). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1966170358 From dchuyko at openjdk.org Fri Feb 21 21:46:56 2025 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Fri, 21 Feb 2025 21:46:56 GMT Subject: RFR: 8350258: AArch64: Client build fails after JDK-8347917 [v2] In-Reply-To: References: Message-ID: On Fri, 21 Feb 2025 08:43:30 GMT, Dmitry Chuyko wrote: >> The location for rfp should be set in in the register map. In particular, it wasn't set in frame::sender_for_interpreter_frame() if neither C2 nor JVMCI were included. >> >> COMPILER1_OR_COMPILER2 condition is used instead of COMPILER2_OR_JVMCI, which also covers INCLUDE_JVMCI case. > > Dmitry Chuyko has updated the pull request incrementally with one additional commit since the last revision: > > Full #if condition Dean, Vladimir, thanks for reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23682#issuecomment-2675608188 From dchuyko at openjdk.org Fri Feb 21 21:46:57 2025 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Fri, 21 Feb 2025 21:46:57 GMT Subject: Integrated: 8350258: AArch64: Client build fails after JDK-8347917 In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 22:42:18 GMT, Dmitry Chuyko wrote: > The location for rfp should be set in in the register map. In particular, it wasn't set in frame::sender_for_interpreter_frame() if neither C2 nor JVMCI were included. > > COMPILER1_OR_COMPILER2 condition is used instead of COMPILER2_OR_JVMCI, which also covers INCLUDE_JVMCI case. This pull request has now been integrated. Changeset: 25322aae Author: Dmitry Chuyko URL: https://git.openjdk.org/jdk/commit/25322aae8e224680db376098d2e45f26cf3334a0 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8350258: AArch64: Client build fails after JDK-8347917 Reviewed-by: dlong, kvn ------------- PR: https://git.openjdk.org/jdk/pull/23682 From vpaprotski at openjdk.org Fri Feb 21 21:52:30 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Fri, 21 Feb 2025 21:52:30 GMT Subject: RFR: 8350516: Update model numbers for ECore-based cpus Message-ID: Add two more models to the list ------------- Commit messages: - update ECore model numbers Changes: https://git.openjdk.org/jdk/pull/23731/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23731&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350516 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23731.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23731/head:pull/23731 PR: https://git.openjdk.org/jdk/pull/23731 From ccheung at openjdk.org Fri Feb 21 22:44:57 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 21 Feb 2025 22:44:57 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v7] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 05:10:44 GMT, Ioi Lam wrote: >> Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. >> >> With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. >> >> To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): >> >>> the format of the configuration and cache files is not specified and is subject to change without notice. >> >> **Notes for reviewers:** >> >> - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. >> - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. >> - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. >> - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. >> - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. >> >> **Misc Note** >> - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccb... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge branch 'master' into 8348426-binary-aot-config-file > - @ashu-mehra comments > - @calvinccheung comments > - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself > - Improve error message when AOTMode=create has an incompatible classpath > - Fixed test cases @vnkozlov > - Update "make test JTREG_AOT_JDK=true ..." to work with binary AOT configuration > - Fixed test failures > - Added comments; fixed FIXMEs > - Added more test cases > - ... and 2 more: https://git.openjdk.org/jdk/compare/00d4e4a9...21f140e7 Updates look good. Thanks! ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23484#pullrequestreview-2634420680 From sviswanathan at openjdk.org Sat Feb 22 01:00:59 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 22 Feb 2025 01:00:59 GMT Subject: RFR: 8350516: Update model numbers for ECore-based cpus In-Reply-To: References: Message-ID: On Fri, 21 Feb 2025 21:47:45 GMT, Volodymyr Paprotski wrote: > Add two more models to the list Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23731#pullrequestreview-2634546637 From stuefe at openjdk.org Sat Feb 22 06:38:53 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 22 Feb 2025 06:38:53 GMT Subject: RFR: 8330174: Protection zone for easier detection of accidental zero-nKlass use [v5] In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 20:58:29 GMT, Ioi Lam wrote: >> It might be easier if we introduce a new "core" region called "protection" that's 16MB in size, and allocate that before the rw region in the output buffer. We never map this region so it doesn't need to be stored in the archive file. >> >> Let me try this out and see if it works. > >> It might be easier if we introduce a new "core" region called "protection" that's 16MB in size, and allocate that before the rw region in the output buffer. We never map this region so it doesn't need to be stored in the archive file. >> >> Let me try this out and see if it works. > > Hi Thomas, please try this out: > > https://github.com/openjdk/jdk/compare/master...iklam:jdk:8330174-protection-zone-ioi-contributions > > It passes all CDS tests. You can see the gap: > > > $ java -Xlog:cds -XX:ArchiveRelocationMode=0 --version | egrep '(Mapped)|(_rs)' > [0.017s][info][cds] Reserved archive_space_rs [0x0000000800000000 - 0x0000000801000000] (16777216) bytes > [0.017s][info][cds] Reserved class_space_rs [0x0000000801000000 - 0x0000000841000000] (1073741824) bytes > [0.017s][info][cds] Mapped static region #0 at base 0x0000000800001000 top 0x0000000800557000 (ReadWrite) > [0.017s][info][cds] Mapped static region #1 at base 0x0000000800557000 top 0x0000000800dee000 (ReadOnly) > [0.017s][info][cds] Mapped static region #2 at base 0x000079ff9c021000 top 0x000079ff9c056000 (Bitmap) > > > You'd need to add code to disable all RWX access in 0x800000000 ~ 0x800001000. @iklam I incorporated your proposal; could you take another look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23190#issuecomment-2676051797 From coleenp at openjdk.org Sat Feb 22 14:49:38 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Sat, 22 Feb 2025 14:49:38 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v7] In-Reply-To: References: Message-ID: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Use modifiers field directly in isInterface. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23572/files - new: https://git.openjdk.org/jdk/pull/23572/files/c23718b3..db7c9782 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23572&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23572&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23572.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23572/head:pull/23572 PR: https://git.openjdk.org/jdk/pull/23572 From coleenp at openjdk.org Sat Feb 22 14:49:38 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Sat, 22 Feb 2025 14:49:38 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v5] In-Reply-To: References: <3jNPEzaXa0Ncf8eu3vct6a_jyH7k4tH_mbRBaKmbMc0=.d3a86a0f-1bed-4084-af92-959f4dbd52f4@github.com> Message-ID: On Fri, 21 Feb 2025 21:08:33 GMT, Dean Long wrote: >> Indeed, even though this checks for the specific bit so widening has no effect, it is better to be cautious here. > >> I don't use the field directly because the field is a short and getModifiers makes it into Modifier. > > But getModifiers() returns `int`, not `Modifier` (which is all static). I mis-remembered why I called getModifiers(), maybe because all the other calls to getModifiers() in Class.java which used be needed, but I did want to call Modifier.isInterface(). If using the 'modifiers' field directly is better, I'll change it to that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1966527692 From qpzhang at openjdk.org Sat Feb 22 15:32:04 2025 From: qpzhang at openjdk.org (Patrick Zhang) Date: Sat, 22 Feb 2025 15:32:04 GMT Subject: RFR: 8350483: AArch64: turn on signum intrinsics by default on Ampere CPUs Message-ID: Set -XX:+UseSignumIntrinsic by default for Ampere CPUs. It is to fix performance problem found on JMH cases `vm.compiler.Signum|java.lang.*MathBench.sig[nN]um*` where fmov is used to transmit data between GPRs and FPRs with significant time cost. Verified on Ampere-1A and found the scores (thrpt, ops/s) of `java.lang.*MathBench.sig[nN]um*` improved 40~50%, while `vm.compiler.Signum._1_signumFloatTest` and `vm.compiler.Signum._3_signumDoubleTest` results gained exponential increases. Also passed GHA sanity checks, and Jtreg tier1 on Ampere-1A as function-wise smoke tests. ------------- Commit messages: - 8350483: AArch64: turn on signum intrinsics by default on Ampere CPUs Changes: https://git.openjdk.org/jdk/pull/23735/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23735&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350483 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23735.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23735/head:pull/23735 PR: https://git.openjdk.org/jdk/pull/23735 From iklam at openjdk.org Sat Feb 22 21:46:57 2025 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 22 Feb 2025 21:46:57 GMT Subject: RFR: 8330174: Protection zone for easier detection of accidental zero-nKlass use [v6] In-Reply-To: <9r3wnpF1xUgBKfShzU5BnExbehtuqbSwuvV5bgBXsuo=.1a58a462-3da7-45b9-95ef-7f6a7fce928a@github.com> References: <9r3wnpF1xUgBKfShzU5BnExbehtuqbSwuvV5bgBXsuo=.1a58a462-3da7-45b9-95ef-7f6a7fce928a@github.com> Message-ID: On Fri, 21 Feb 2025 18:38:20 GMT, Thomas Stuefe wrote: >> If we wrongly decode an nKlass of `0`, and the nKlass encoding base is not NULL (typical for most cases that run with CDS enabled), the resulting pointer points to the start of the Klass encoding range. That area is readable. If CDS is enabled, it will be at the start of the CDS metadata archive. If CDS is off, it is at the start of the class space. >> >> Now, both CDS and class space allocate a safety buffer at the start to prevent Klass structures from being located there. However, that memory is still readable, so we can read garbage data from that area. In the case of CDS, that area is just 16 bytes, after that come real data. Since Klass is large, most accesses will read beyond the 16-byte zone. >> >> We should protect the first page in the narrow Klass encoding range to make analysis of errors like this easier. Especially in release builds where decode_not_null does not assert. We already use a similar technique in the heap, and most OSes protect the zero page for the same reason. >> >> This patch does that. Now, decoding an `0` nKlass and then using the result `Klass` - calling virtual functions or accessing members - crashes right away. >> >> Additionally, the patch provides a helpful output in the register/stack section, e.g: >> >> >> RDI=0x0000000800000000 points into nKlass protection zone >> >> >> >> Testing: >> - GHAs. >> - I tested the patch manually on x64 Linux for both CDS on, CDS off and zero-based encoding, CDS off and non-zero-based encoding. >> - I tested manually on Windows x64 >> - I also prepared an automatic gtest, but that needs some preparatory work on the gtest suite first to work (see https://bugs.openjdk.org/browse/JDK-8348029) >> >> -- Update 2024-01-22 -- >> I added a jtreg test that is more thorough than a gtest (also scans the produced hs-err file) > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - copyrights > - Redo everything, including Iois changes > - reverse everything > - Merge branch 'master' into JDK-8330174-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use > - fix whitespace error > - fix test > - test-fixes > - fix bug found with jtreg test where metaspace buddy allocator would accidentally replace the protection mapping > - add jtreg test; replaces the gtest > - Merge branch 'master' into JDK-8330174-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use > - ... and 4 more: https://git.openjdk.org/jdk/compare/c9d0b971...6d85e1fe Thomas, thanks for making the changes. I think we should amend this comment in filemap.hpp: - char* _mapped_base_address; // Actual base address where archive is mapped. + char* _mapped_base_address; // Actual base address used for mapping the core // regions. Note that the lowest core region // (rw for the static archive) is mapped at offset // MetaspaceShared::protection_zone_size() from this address src/hotspot/share/cds/archiveBuilder.cpp line 163: > 161: _mapped_static_archive_top(nullptr), > 162: _buffer_to_requested_delta(0), > 163: _pz_region("pz", MAX_SHARED_DELTA), // protection zone To avoid confusion: // protection zone -- used only during dumping; does NOT exist in cds archive. src/hotspot/share/cds/archiveBuilder.hpp line 214: > 212: // The "pz" region is used only during static dumps to reserve an unused space between SharedBaseAddress and > 213: // the bottom of the rw region. During runtime, this space will be filled with a reserved area that disallows > 214: // read/write/exec, so we can track for bad CompressedKlassPointers encoding. Add: // Note: this region does NOT exist in the cds archive. src/hotspot/share/cds/metaspaceShared.cpp line 1259: > 1257: *(mapped_base_address) = 'P'; > 1258: *(mapped_base_address + prot_zone_size - 1) = 'P'; > 1259: #endif Need a comment about the purpose of this block of code. src/hotspot/share/cds/metaspaceShared.cpp line 1378: > 1376: // > 1377: // In order for those IDs to still be valid, we need to dictate base and shift: base should be the > 1378: // mapping start (including protection zone), shift the shift used at archive generation time. - shift the shift used + shift should be the shift used src/hotspot/share/memory/metaspace.cpp line 811: > 809: > 810: // After narrowKlass encoding scheme is decided: if the encoding base points to class space start, > 811: // establish a protection zone. Maybe add a comment like: This way we can ensure that no valid compressed pointers will have the value of zero. src/hotspot/share/memory/metaspace.cpp line 812: > 810: // After narrowKlass encoding scheme is decided: if the encoding base points to class space start, > 811: // establish a protection zone. > 812: if (CompressedKlassPointers::base() == (address)rs.base()) { What happens if this condition is not true? Do we need to check for encoded values of zero? Or, is the following true? } else { assert((unitx)(CompressedKlassPointers::base()) == 0x0, "must be zero based"); } Is the `else` part covered by your new test case? ------------- PR Review: https://git.openjdk.org/jdk/pull/23190#pullrequestreview-2635295087 PR Review Comment: https://git.openjdk.org/jdk/pull/23190#discussion_r1966603821 PR Review Comment: https://git.openjdk.org/jdk/pull/23190#discussion_r1966603266 PR Review Comment: https://git.openjdk.org/jdk/pull/23190#discussion_r1966604504 PR Review Comment: https://git.openjdk.org/jdk/pull/23190#discussion_r1966604847 PR Review Comment: https://git.openjdk.org/jdk/pull/23190#discussion_r1966606537 PR Review Comment: https://git.openjdk.org/jdk/pull/23190#discussion_r1966606813 From kbarrett at openjdk.org Sun Feb 23 10:02:58 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 23 Feb 2025 10:02:58 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow In-Reply-To: References: Message-ID: On Fri, 21 Feb 2025 20:59:34 GMT, Dean Long wrote: > I don't see where we check the return value of align_up_or_min for the changes in src/hotspot/share/gc/shared/gcArguments.cpp. If tests fail because of align_up, maybe the test should be fixed? I share @dean-long concerns. This PR doesn't seem like the right direction. I think the earlier PR was correct, and just uncovered problems elsewhere that should be fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23711#issuecomment-2676749537 From azafari at openjdk.org Sun Feb 23 16:41:49 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Sun, 23 Feb 2025 16:41:49 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v27] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. > - Adding a runtime flag for selecting the old or new version can be added later. > - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: removed the size par from set_..._tag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20425/files - new: https://git.openjdk.org/jdk/pull/20425/files/18ec1db5..a89a013e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=25-26 Stats: 26 lines in 15 files changed: 0 ins; 1 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From azafari at openjdk.org Sun Feb 23 16:48:02 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Sun, 23 Feb 2025 16:48:02 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v21] In-Reply-To: <1AtAN_70cbiU2-KRyPK90QnwMedZxIsZ22KgBwioyOc=.e415931f-34cd-4157-9cc6-08b95d89efa2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <1AtAN_70cbiU2-KRyPK90QnwMedZxIsZ22KgBwioyOc=.e415931f-34cd-4157-9cc6-08b95d89efa2@github.com> Message-ID: <-NDyafiT1K5D4E2f2TD6vNiw7qhB27wb_LHmJB3PFHA=.cee7eb67-907c-42b8-a15b-f03b55d80be6@github.com> On Fri, 7 Feb 2025 10:32:20 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed merge problems > > src/hotspot/share/nmt/vmatree.hpp line 215: > >> 213: tty->print_cr("Flag %s R: " INT64_FORMAT " C: " INT64_FORMAT, NMTUtil::tag_to_enum_name((MemTag)i), tag[i].reserve, tag[i].commit); >> 214: } >> 215: } > > This can be removed Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1966832096 From azafari at openjdk.org Sun Feb 23 16:59:06 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Sun, 23 Feb 2025 16:59:06 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v21] In-Reply-To: <1AtAN_70cbiU2-KRyPK90QnwMedZxIsZ22KgBwioyOc=.e415931f-34cd-4157-9cc6-08b95d89efa2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <1AtAN_70cbiU2-KRyPK90QnwMedZxIsZ22KgBwioyOc=.e415931f-34cd-4157-9cc6-08b95d89efa2@github.com> Message-ID: On Fri, 7 Feb 2025 10:32:06 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed merge problems > > src/hotspot/share/nmt/vmatree.hpp line 267: > >> 265: }); >> 266: tty->cr(); >> 267: } > > This can be removed, I'm rather sure(?) Only print_self is removed. The other two methods are used in the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1966834727 From azafari at openjdk.org Sun Feb 23 17:05:01 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Sun, 23 Feb 2025 17:05:01 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v21] In-Reply-To: <1AtAN_70cbiU2-KRyPK90QnwMedZxIsZ22KgBwioyOc=.e415931f-34cd-4157-9cc6-08b95d89efa2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <1AtAN_70cbiU2-KRyPK90QnwMedZxIsZ22KgBwioyOc=.e415931f-34cd-4157-9cc6-08b95d89efa2@github.com> Message-ID: On Fri, 7 Feb 2025 10:29:42 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed merge problems > > test/hotspot/gtest/nmt/test_nmt_treap.cpp line 238: > >> 236: EXPECT_LE(unexpected_count, REPEATS / 2) << "SSL Avg: " << sll_sum / REPEATS << " Treap Avg: " << treap_sum / REPEATS; >> 237: } >> 238: > > These can be removed. We shouldn't have performance benchmarks running on tier1, as they'll use unnecessary CPU and time. We're also removing the treap in favour of the RB-tree soon :-). Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1966835927 From azafari at openjdk.org Sun Feb 23 17:08:10 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Sun, 23 Feb 2025 17:08:10 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v20] In-Reply-To: <0I2mUN4F0QBOYVFxIX7UE1aYwJIEUNAOucOjCuxmS58=.091c86ae-399c-4786-b2e9-20593a4e4425@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <0I2mUN4F0QBOYVFxIX7UE1aYwJIEUNAOucOjCuxmS58=.091c86ae-399c-4786-b2e9-20593a4e4425@github.com> Message-ID: On Tue, 4 Feb 2025 13:44:15 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fix in shendoahCardTable > > test/hotspot/jtreg/runtime/NMT/VirtualAllocCommitMerge.java line 325: > >> 323: output.shouldMatch("\\[0x[0]*" + Long.toHexString(addr) + " - 0x[0]*" >> 324: + Long.toHexString(addr + size) >> 325: + "\\] committed " + sizeString); > > Not sure why this is changed. Removed the changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1966836547 From azafari at openjdk.org Sun Feb 23 17:17:44 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Sun, 23 Feb 2025 17:17:44 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v28] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: <5y2z_GCBI58DmRzIKuiKl7UGvXJP23dAPwNN-I6VEtA=.73aafeee-00ce-4f9d-803a-8259d49dda9d@github.com> > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. > - Adding a runtime flag for selecting the old or new version can be added later. > - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: removed remaining of the unrelated changes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20425/files - new: https://git.openjdk.org/jdk/pull/20425/files/a89a013e..39f7482a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=26-27 Stats: 160 lines in 5 files changed: 4 ins; 147 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From azafari at openjdk.org Sun Feb 23 17:17:45 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Sun, 23 Feb 2025 17:17:45 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v21] In-Reply-To: <1AtAN_70cbiU2-KRyPK90QnwMedZxIsZ22KgBwioyOc=.e415931f-34cd-4157-9cc6-08b95d89efa2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <1AtAN_70cbiU2-KRyPK90QnwMedZxIsZ22KgBwioyOc=.e415931f-34cd-4157-9cc6-08b95d89efa2@github.com> Message-ID: <0kzl8sjS_8Q8liswSu12f6cBG5p_gp8DrnYxwe3HXb4=.2c75546a-df64-43e6-bcfb-b8d5f6fcf3dc@github.com> On Fri, 7 Feb 2025 10:31:18 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed merge problems > > src/hotspot/share/opto/stringopts.hpp line 1: > >> 1: /* > > Remove the changes in this file. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1966838079 From asmehra at openjdk.org Sun Feb 23 17:26:56 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Sun, 23 Feb 2025 17:26:56 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v7] In-Reply-To: References: Message-ID: <2VgYzYhshPAAdh1bdBsJLvcN0kQ3X3NNeizoahDzsR0=.f778568d-bbfd-47ec-9d48-3969cb861cb5@github.com> On Thu, 20 Feb 2025 05:10:44 GMT, Ioi Lam wrote: >> Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. >> >> With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. >> >> To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): >> >>> the format of the configuration and cache files is not specified and is subject to change without notice. >> >> **Notes for reviewers:** >> >> - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. >> - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. >> - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. >> - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. >> - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. >> >> **Misc Note** >> - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccb... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge branch 'master' into 8348426-binary-aot-config-file > - @ashu-mehra comments > - @calvinccheung comments > - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself > - Improve error message when AOTMode=create has an incompatible classpath > - Fixed test cases @vnkozlov > - Update "make test JTREG_AOT_JDK=true ..." to work with binary AOT configuration > - Fixed test failures > - Added comments; fixed FIXMEs > - Added more test cases > - ... and 2 more: https://git.openjdk.org/jdk/compare/00d4e4a9...21f140e7 src/hotspot/share/cds/cppVtables.cpp line 200: > 198: // _orig_cpp_vtptrs[ConstantPool_Kind] == ((intptr_t**)cp)[0] > 199: // > 200: // _archived_cpp_vtptrs is a map of all the vptprs used by classes in a preimage. E.g., for Thanks for adding these comments. I think I now understand why we need `_archived_cpp_vtptrs`. I am wondering if we really need to store this table in the preimage. When the control enters `CppVtables::dumptime_init`, if we are dumping the final archive, then the `_index[kind].cloned_vtable()` would be pointing to the vtable in the preimage. So we can initialize the `_archived_cpp_vtptrs` at that time before `_index[kind]` is overwritten by the runtime vtable data. Wouldn't that work? Something like this: ```@@ -231,13 +231,15 @@ char* CppVtables::_vtables_serialized_base = nullptr; void CppVtables::dumptime_init(ArchiveBuilder* builder) { assert(CDSConfig::is_dumping_static_archive(), "cpp tables are only dumped into static archive"); - CPP_VTABLE_TYPES_DO(ALLOCATE_AND_INITIALIZE_VTABLE); - - if (!CDSConfig::is_dumping_final_static_archive()) { + // When dumping final archive, _index[kind] at this point is in the preimage. + // Store the vtable pointers present in the preimage as _index[kind] will now be rewritten + // to point to the runtime vtable data. + if (CDSConfig::is_dumping_final_static_archive()) { for (int kind = 0; kind < _num_cloned_vtable_kinds; kind++) { _archived_cpp_vtptrs[kind] = _index[kind]->cloned_vtable(); } } + CPP_VTABLE_TYPES_DO(ALLOCATE_AND_INITIALIZE_VTABLE); size_t cpp_tables_size = builder->rw_region()->top() - builder->rw_region()->base(); builder->alloc_stats()->record_cpp_vtables((int)cpp_tables_size); @@ -253,16 +255,6 @@ void CppVtables::serialize(SerializeClosure* soc) { if (soc->reading()) { CPP_VTABLE_TYPES_DO(INITIALIZE_VTABLE); } - - if (soc->writing() && !CDSConfig::is_dumping_preimage_static_archive()) { - // This table is written only when creating the preimage. It will be used - // only when writing the final static archive. - memset(_archived_cpp_vtptrs, 0, sizeof(_archived_cpp_vtptrs)); - } - - for (int kind = 0; kind < _num_cloned_vtable_kinds; kind++) { - soc->do_ptr(&_archived_cpp_vtptrs[kind]); - } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1966840116 From azafari at openjdk.org Sun Feb 23 21:50:42 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Sun, 23 Feb 2025 21:50:42 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v29] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. > - Adding a runtime flag for selecting the old or new version can be added later. > - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: once more. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20425/files - new: https://git.openjdk.org/jdk/pull/20425/files/39f7482a..0a61fec3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=27-28 Stats: 6 lines in 3 files changed: 4 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From dholmes at openjdk.org Mon Feb 24 00:07:02 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 24 Feb 2025 00:07:02 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v5] In-Reply-To: References: Message-ID: On Fri, 21 Feb 2025 06:19:55 GMT, Calvin Cheung wrote: >> This changset refactors CDS class paths and module paths validation code into a new class `AOTCodeSource` and related class `AOTCodeSourceConfig`. Code has been moved from filemap.[c|h]pp, classLoader.[c|h]pp, and classLoaderExt.[c|h]pp to aotCodeSource.[c|h]pp. CDS dependencies have been removed from `classLoader.cpp`. More refactoring could be done, such as removing `classLoaderExt.cpp`, in a future RFE. >> >> Passed tiers 1 - 5 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > rename classes and add vm_exit_during_initialization call A couple of minor suggestions, but otherwise nothing further from me. Thanks src/hotspot/share/cds/aotClassLocation.cpp line 53: > 51: const AOTClassLocationConfig* AOTClassLocationConfig::_runtime_instance = nullptr; > 52: > 53: // A ClassLocationStream represents a list of code sources, which can be iterated using Suggestion: // A ClassLocationStream represents a list of code locations, which can be iterated using src/hotspot/share/cds/aotClassLocation.cpp line 133: > 131: }; > 132: > 133: // AllClassLocationStreams is used to iterate over all the code sources that Suggestion: // AllClassLocationStreams is used to iterate over all the code locations that src/hotspot/share/cds/aotClassLocation.hpp line 122: > 120: // AOTClassLocations (subjected to AOTClassLocationConfig::validate()). > 121: // > 122: // In general, validation is performed on the AOTClassLocations to ensure the code sources used Suggestion: // In general, validation is performed on the AOTClassLocations to ensure the code locations used src/hotspot/share/classfile/classLoaderDataShared.cpp line 157: > 155: } > 156: > 157: void ClassLoaderDataShared::ensure_module_entry_table_exist(oop class_loader) { Suggestion: void ClassLoaderDataShared::ensure_module_entry_table_exists(oop class_loader) { Tables exist, but a single table exists. src/hotspot/share/classfile/classLoaderDataShared.hpp line 37: > 35: class ClassLoaderDataShared : AllStatic { > 36: static bool _full_module_graph_loaded; > 37: static void ensure_module_entry_table_exist(oop class_loader); Suggestion: static void ensure_module_entry_table_exists(oop class_loader); ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23476#pullrequestreview-2635829484 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1966936547 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1966936586 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1966936262 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1966939280 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1966939500 From stuefe at openjdk.org Mon Feb 24 07:05:31 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Feb 2025 07:05:31 GMT Subject: RFR: 8330174: Protection zone for easier detection of accidental zero-nKlass use [v7] In-Reply-To: References: Message-ID: > If we wrongly decode an nKlass of `0`, and the nKlass encoding base is not NULL (typical for most cases that run with CDS enabled), the resulting pointer points to the start of the Klass encoding range. That area is readable. If CDS is enabled, it will be at the start of the CDS metadata archive. If CDS is off, it is at the start of the class space. > > Now, both CDS and class space allocate a safety buffer at the start to prevent Klass structures from being located there. However, that memory is still readable, so we can read garbage data from that area. In the case of CDS, that area is just 16 bytes, after that come real data. Since Klass is large, most accesses will read beyond the 16-byte zone. > > We should protect the first page in the narrow Klass encoding range to make analysis of errors like this easier. Especially in release builds where decode_not_null does not assert. We already use a similar technique in the heap, and most OSes protect the zero page for the same reason. > > This patch does that. Now, decoding an `0` nKlass and then using the result `Klass` - calling virtual functions or accessing members - crashes right away. > > Additionally, the patch provides a helpful output in the register/stack section, e.g: > > > RDI=0x0000000800000000 points into nKlass protection zone > > > > Testing: > - GHAs. > - I tested the patch manually on x64 Linux for both CDS on, CDS off and zero-based encoding, CDS off and non-zero-based encoding. > - I tested manually on Windows x64 > - I also prepared an automatic gtest, but that needs some preparatory work on the gtest suite first to work (see https://bugs.openjdk.org/browse/JDK-8348029) > > -- Update 2024-01-22 -- > I added a jtreg test that is more thorough than a gtest (also scans the produced hs-err file) Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: feedback ioi ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23190/files - new: https://git.openjdk.org/jdk/pull/23190/files/6d85e1fe..58fa6a1e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23190&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23190&range=05-06 Stats: 15 lines in 5 files changed: 11 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23190.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23190/head:pull/23190 PR: https://git.openjdk.org/jdk/pull/23190 From stuefe at openjdk.org Mon Feb 24 07:05:33 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Feb 2025 07:05:33 GMT Subject: RFR: 8330174: Protection zone for easier detection of accidental zero-nKlass use [v6] In-Reply-To: References: <9r3wnpF1xUgBKfShzU5BnExbehtuqbSwuvV5bgBXsuo=.1a58a462-3da7-45b9-95ef-7f6a7fce928a@github.com> Message-ID: On Sat, 22 Feb 2025 21:33:29 GMT, Ioi Lam wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: >> >> - copyrights >> - Redo everything, including Iois changes >> - reverse everything >> - Merge branch 'master' into JDK-8330174-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use >> - fix whitespace error >> - fix test >> - test-fixes >> - fix bug found with jtreg test where metaspace buddy allocator would accidentally replace the protection mapping >> - add jtreg test; replaces the gtest >> - Merge branch 'master' into JDK-8330174-Protection-zone-for-easier-detection-of-accidental-zero-nKlass-use >> - ... and 4 more: https://git.openjdk.org/jdk/compare/7ada9801...6d85e1fe > > src/hotspot/share/memory/metaspace.cpp line 812: > >> 810: // After narrowKlass encoding scheme is decided: if the encoding base points to class space start, >> 811: // establish a protection zone. >> 812: if (CompressedKlassPointers::base() == (address)rs.base()) { > > What happens if this condition is not true? Do we need to check for encoded values of zero? > > Or, is the following true? > > > } else { > assert((unitx)(CompressedKlassPointers::base()) == 0x0, "must be zero based"); > } > > > Is the `else` part covered by your new test case? Its the latter (even though base could be anywhere in front of the Klass range as long as the encoding covers the class space, I never used that since I planned to do this zero-access page for a while now). I added the else assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23190#discussion_r1967112894 From stuefe at openjdk.org Mon Feb 24 07:08:35 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Feb 2025 07:08:35 GMT Subject: RFR: 8330174: Protection zone for easier detection of accidental zero-nKlass use [v8] In-Reply-To: References: Message-ID: > If we wrongly decode an nKlass of `0`, and the nKlass encoding base is not NULL (typical for most cases that run with CDS enabled), the resulting pointer points to the start of the Klass encoding range. That area is readable. If CDS is enabled, it will be at the start of the CDS metadata archive. If CDS is off, it is at the start of the class space. > > Now, both CDS and class space allocate a safety buffer at the start to prevent Klass structures from being located there. However, that memory is still readable, so we can read garbage data from that area. In the case of CDS, that area is just 16 bytes, after that come real data. Since Klass is large, most accesses will read beyond the 16-byte zone. > > We should protect the first page in the narrow Klass encoding range to make analysis of errors like this easier. Especially in release builds where decode_not_null does not assert. We already use a similar technique in the heap, and most OSes protect the zero page for the same reason. > > This patch does that. Now, decoding an `0` nKlass and then using the result `Klass` - calling virtual functions or accessing members - crashes right away. > > Additionally, the patch provides a helpful output in the register/stack section, e.g: > > > RDI=0x0000000800000000 points into nKlass protection zone > > > > Testing: > - GHAs. > - I tested the patch manually on x64 Linux for both CDS on, CDS off and zero-based encoding, CDS off and non-zero-based encoding. > - I tested manually on Windows x64 > - I also prepared an automatic gtest, but that needs some preparatory work on the gtest suite first to work (see https://bugs.openjdk.org/browse/JDK-8348029) > > -- Update 2024-01-22 -- > I added a jtreg test that is more thorough than a gtest (also scans the produced hs-err file) Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: remove test coding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23190/files - new: https://git.openjdk.org/jdk/pull/23190/files/58fa6a1e..14641c77 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23190&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23190&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23190.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23190/head:pull/23190 PR: https://git.openjdk.org/jdk/pull/23190 From epeter at openjdk.org Mon Feb 24 07:25:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Feb 2025 07:25:59 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 16:14:09 GMT, Vladimir Kozlov wrote: >> @vnkozlov I suggest that I change the probability to something quite low now, just to make sure that the fast-loop is placed nicely. When I do the experiments for aliasing-analysis runtime-checks, then I will be able to benchmark much better for both cases, since it is much easier to create many different cases. At that point, I could still adapt the probabilities to a different constant. Or maybe I can somehow adjust the probabilities in the chain such that they are balanced. Like if there is 1 condition, give it `0.5`, if there are 2 give them each `sqrt(0.5)`, if there are `n` then `pow(0.5, 1/n)`, so that once you multiply them you get `pow(pow(0.5, 1/n),n) = 0.5`. We could also set another "target" probability than `0.5`. The issue is that experimenting now is a little difficult, because I only have the alignment-checks to play with, which are really really rare to fail in the "real world", I think. But aliasing-checks are more likely to fail, so there could be more interesti ng benchmark results there. >> >> Does that sound ok? >> >>> Can we profile alignment in Interpreter (and C1)? >> >> It would be nice if we could profile alignment or aliasing. Maybe that is possible. But I suppose there are always cases where profiling is not available (Xcomp ?), and we should have reasonable defaults there. We could investigate profiling in a second step, to improve things if we think that is worth it. Profiling these things would also be additional complexity - I'm not convinced yet it is worth it. >> >> What do you think? > >> > Can we profile alignment in Interpreter (and C1)? >> >> It would be nice if we could profile alignment or aliasing. Maybe that is possible. But I suppose there are always cases where profiling is not available (Xcomp ?), and we should have reasonable defaults there. We could investigate profiling in a second step, to improve things if we think that is worth it. Profiling these things would also be additional complexity - I'm not convinced yet it is worth it. >> >> What do you think? > > You should not worry about `-Xcomp` it is testing flag - we can use some default there. > I am fine if you think profiling will not bring us much benefits. Note, I am not asking create counters - just a bit to indicate if we had unaligned access to native memory in a method. In such case we may skip predicate and generate multi versions loop during compilation. On other hand, we may have unaligned access only during startup and not later when we compile method. Anyway, it does not affect these changes. > > I will look on changes more later. @vnkozlov I'll think about the "stall" vs "delay" suggestion. > How profitable (performance wise) to optimize slow path loop? Can we skip any optimizations for it - treat it as not-Counted? I suppose that depends on if the slow path loop will be taken. Imagine we are working on some unaligned MemorySegment (or with aliasing runtime-checks failing). In these cases without optimizing we would for example not unroll. But unrolling can give quite the speedup, of course at the cost of more compile time and code size. Also some RangeCheck eliminations only happen if you have a pre-main-post loop structure. There are probably other optimizations as well. So yes, if the slow path loop is taken often, then optimizing is probably worth it. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2677607527 From haosun at openjdk.org Mon Feb 24 07:44:55 2025 From: haosun at openjdk.org (Hao Sun) Date: Mon, 24 Feb 2025 07:44:55 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: Message-ID: On Fri, 21 Feb 2025 10:23:37 GMT, Ferenc Rakoczi wrote: > Was this a build attempted on an aarch64 for the other architectures? Yes. It's a cross-build on AArch64 for other architectures. > Instruction_aarch64 should not have been there in a ppc build Oops. I didn't check the error message carefully. It might be some issue in our CI. I will check that. Sorry for the noise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2677637524 From epeter at openjdk.org Mon Feb 24 08:03:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Feb 2025 08:03:59 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 16:14:09 GMT, Vladimir Kozlov wrote: >> @vnkozlov I suggest that I change the probability to something quite low now, just to make sure that the fast-loop is placed nicely. When I do the experiments for aliasing-analysis runtime-checks, then I will be able to benchmark much better for both cases, since it is much easier to create many different cases. At that point, I could still adapt the probabilities to a different constant. Or maybe I can somehow adjust the probabilities in the chain such that they are balanced. Like if there is 1 condition, give it `0.5`, if there are 2 give them each `sqrt(0.5)`, if there are `n` then `pow(0.5, 1/n)`, so that once you multiply them you get `pow(pow(0.5, 1/n),n) = 0.5`. We could also set another "target" probability than `0.5`. The issue is that experimenting now is a little difficult, because I only have the alignment-checks to play with, which are really really rare to fail in the "real world", I think. But aliasing-checks are more likely to fail, so there could be more interesti ng benchmark results there. >> >> Does that sound ok? >> >>> Can we profile alignment in Interpreter (and C1)? >> >> It would be nice if we could profile alignment or aliasing. Maybe that is possible. But I suppose there are always cases where profiling is not available (Xcomp ?), and we should have reasonable defaults there. We could investigate profiling in a second step, to improve things if we think that is worth it. Profiling these things would also be additional complexity - I'm not convinced yet it is worth it. >> >> What do you think? > >> > Can we profile alignment in Interpreter (and C1)? >> >> It would be nice if we could profile alignment or aliasing. Maybe that is possible. But I suppose there are always cases where profiling is not available (Xcomp ?), and we should have reasonable defaults there. We could investigate profiling in a second step, to improve things if we think that is worth it. Profiling these things would also be additional complexity - I'm not convinced yet it is worth it. >> >> What do you think? > > You should not worry about `-Xcomp` it is testing flag - we can use some default there. > I am fine if you think profiling will not bring us much benefits. Note, I am not asking create counters - just a bit to indicate if we had unaligned access to native memory in a method. In such case we may skip predicate and generate multi versions loop during compilation. On other hand, we may have unaligned access only during startup and not later when we compile method. Anyway, it does not affect these changes. > > I will look on changes more later. @vnkozlov I mean the issue this: once I implement aliasing-analysis runtime-checks with this multiversion approach, then we'd get regressions if we do not optimize the slow path loop. Currently, we would not vectorize (because we have to be ready for aliasing cases), but we at least unroll, and whatever else we can except vectorization. But if we do not optimize the slow path loop, then we would get performance regressions in aliasing cases because we have no unrolling for them any more. I think we need to avoid that - would you agree? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2677667789 From jsjolen at openjdk.org Mon Feb 24 08:30:11 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 24 Feb 2025 08:30:11 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v29] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Sun, 23 Feb 2025 21:50:42 GMT, Afshin Zafari wrote: >> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. >> - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. >> - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` >> - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. >> - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. >> - Adding a runtime flag for selecting the old or new version can be added later. >> - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > once more. Hi, You should take out these bug fixes and add them as separate PRs instead. They need separate reviewing so that we can discuss them outside of the context of this port. src/hotspot/share/nmt/memoryFileTracker.cpp line 183: > 181: // Only account the committed memory. > 182: snap->commit_memory(current->committed()); > 183: });} Style: Restore to what it was before. src/hotspot/share/nmt/virtualMemoryTracker.hpp line 404: > 402: friend class VirtualMemoryTrackerTest; > 403: friend class CommittedVirtualMemoryTest; > 404: These two classes doesn't exist anymore. src/hotspot/share/nmt/vmatree.cpp line 80: > 78: MemTag tag = leqA_n->val().out.mem_tag(); > 79: stA.out.set_tag(tag); > 80: LEQ_A.state.out.set_tag(tag); This also seems like a bug fix that must be separated out into a separate PR along with test cases. src/hotspot/share/nmt/vmatree.cpp line 211: > 209: // Finally, we can register the new region [A, B)'s summary data. > 210: MemTag tag_to_change = use_tag_inplace ? stA.out.mem_tag() : metadata.mem_tag; > 211: SingleDiff& rescom = diff.tag[NMTUtil::tag_to_index(tag_to_change)]; This seems to be a bug fix to 8335091. You should open a separate mainline PR with this fix and add a testcase for it. Your fix should be integrated before this PR is. test/hotspot/gtest/runtime/test_virtualMemoryTracker.cpp line 1: > 1: /* Why are these tests removed? Can they be adapted to the new implementation? ------------- PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2636234435 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1967189701 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1967186209 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1967184884 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1967183983 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1967173959 From adinn at openjdk.org Mon Feb 24 08:39:54 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 24 Feb 2025 08:39:54 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 07:41:58 GMT, Hao Sun wrote: >> @shqking, I changed the copyright years, but I don't really understand how the aarch64-specific code can overflow buffers on other architectures. As far as I understand, Instruction_aarch64 should not have been there in a ppc build. >> Was this a build attempted on an aarch64 for the other architectures? > >> Was this a build attempted on an aarch64 for the other architectures? > > Yes. It's a cross-build on AArch64 for other architectures. > >> Instruction_aarch64 should not have been there in a ppc build > > Oops. I didn't check the error message carefully. It might be some issue in our CI. I will check that. > > Sorry for the noise. @shqking There is a [known issue](https://bugs.openjdk.org/browse/JDK-8349921) with cross-builds that is still being investigated. I think that may explain the problem you are seeing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2677735964 From rcastanedalo at openjdk.org Mon Feb 24 08:59:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 24 Feb 2025 08:59:54 GMT Subject: RFR: 8344009: Improve compiler memory statistics In-Reply-To: <0wHGNSlwe7cWb7Plad2n8Swy8rayYTAf5IETuw9zl4U=.a4d6a129-aebc-4639-aaef-92ee6c4552c7@github.com> References: <0wHGNSlwe7cWb7Plad2n8Swy8rayYTAf5IETuw9zl4U=.a4d6a129-aebc-4639-aaef-92ee6c4552c7@github.com> Message-ID: On Thu, 20 Feb 2025 13:59:57 GMT, Roberto Casta?eda Lozano wrote: > > @robcasloz I identified and hopefully fixed a small issue that hit the "disabled" path. Turns out we allocate arena chunks a lot more frequently than I thought, and the new unconditional call to Thread::current() in there was hurting a bit. I now avoid this unless I know the statistic is enabled. > > With this patch, on my machine the difference between unpatched and patched JVM with stats disabled is below one standard deviation for the benchmark in question. > > Great, thanks! Will re-run benchmarking and report results early next week. Functional test results (Oracle tier1-5) still look good for the latest commit (dd7a06ad). I can confirm that the C2 speed regression on our linux-x64 machines is almost fully mitigated. The 2-3% regression on our macosx-aarch64 machines does not seem to be addressed by the latest changes though, but as I mentioned before I think it is in the acceptable range (and only affects one benchmark). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2677775924 From bkilambi at openjdk.org Mon Feb 24 09:37:54 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 24 Feb 2025 09:37:54 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: <1yB95sOajuS5ptFI0GQWLepii5JsZ9DOsje-TEFyFYs=.a325ad18-17ed-4e77-b1e3-0bad2cf55c67@github.com> Message-ID: On Thu, 20 Feb 2025 17:22:25 GMT, Ferenc Rakoczi wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2618: >> >>> 2616: INSN(smaxp, 0, 0b101001, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S >>> 2617: INSN(sminp, 0, 0b101011, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S >>> 2618: INSN(sqdmulh,0, 0b101101, false); // accepted arrangements: T4H, T8H, T2S, T4S >> >> Hi, not a comment on the algorithm itself but you might have to add these new instructions in the gtest for aarch64 here - test/hotspot/gtest/aarch64/aarch64-asmtest.py and use this file to generate test/hotspot/gtest/aarch64/asmtest.out.h which would contain these newly added instructions. > > I have tried that, but the python script (actually the as command that it started) threw error messages: > > aarch64ops.s:338:24: error: index must be a multiple of 8 in range [0, 32760]. > prfm PLDL1KEEP, [x15, 43] > ^ > aarch64ops.s:357:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] > sub x1, x10, x23, sxth #2 > ^ > aarch64ops.s:359:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] > add x11, x21, x5, uxtb #3 > ^ > aarch64ops.s:360:22: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] > adds x11, x17, x17, uxtw #1 > ^ > aarch64ops.s:361:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] > sub x11, x0, x15, uxtb #1 > ^ > aarch64ops.s:362:19: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] > subs x7, x1, x0, sxth #2 > ^ > This is without any modifications from what is in the master branch currently. You might have to use an assembler from the latest binutils build (if the system default isn't the latest) and add the path to the assembler in the "AS" variable. Also you can run it something like - `python aarch64-asmtest.py | expand > asmtest.out.h`. Please let me know if you still face problems. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967284270 From iwalulya at openjdk.org Mon Feb 24 09:53:55 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 24 Feb 2025 09:53:55 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v4] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 18 Feb 2025 09:20:57 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2636497189 From rcastanedalo at openjdk.org Mon Feb 24 10:13:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 24 Feb 2025 10:13:54 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v4] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 13:14:34 GMT, Thomas Stuefe wrote: >> Greetings, >> >> This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. >> >> Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. >> >> I wanted to track that information correctly and display it clearly in a way that is easy to understand. >> >> The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). >> >> The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. >> >> The statistic gives us two new forms of output: >> >> 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: >> >> >> Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: >> Phase Total ra node comp type index reglive regsplit cienv other >> none 1205512 155104 982984 33712 0 0 0 0 0 33712 >> parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 >> optimizer 916584 0 556416 0 0 0 0 0 0 360168 >> escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 >> connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 >> macroEliminate 196448 0 196448 0 0 0 0 0 0 0 >> iterGVN 327440 0 196368 131072 0 0 0 0 0 0 >> incrementalInline 3992816 0 3043704 62... > > Thomas Stuefe has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > avoid Thread::current in high traffic chunk alloc path src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 255: > 253: char tmp[1024]; > 254: _k->as_C_string(tmp, sizeof(tmp)); > 255: if (UseNewCode){ printf("%s\n",tmp); fflush(stdout);} I guess this use of `UseNewCode` is not meant to be integrated? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1967340962 From rcastanedalo at openjdk.org Mon Feb 24 10:33:02 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 24 Feb 2025 10:33:02 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v4] In-Reply-To: References: Message-ID: <_oNuzx4YepRchoguAnBbXw-31T14WgK8oQpC47FAJOc=.6edd8fcc-6757-448b-992d-b13b94af7c59@github.com> On Thu, 20 Feb 2025 13:14:34 GMT, Thomas Stuefe wrote: >> Greetings, >> >> This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. >> >> Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. >> >> I wanted to track that information correctly and display it clearly in a way that is easy to understand. >> >> The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). >> >> The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. >> >> The statistic gives us two new forms of output: >> >> 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: >> >> >> Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: >> Phase Total ra node comp type index reglive regsplit cienv other >> none 1205512 155104 982984 33712 0 0 0 0 0 33712 >> parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 >> optimizer 916584 0 556416 0 0 0 0 0 0 360168 >> escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 >> connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 >> macroEliminate 196448 0 196448 0 0 0 0 0 0 0 >> iterGVN 327440 0 196368 131072 0 0 0 0 0 0 >> incrementalInline 3992816 0 3043704 62... > > Thomas Stuefe has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > avoid Thread::current in high traffic chunk alloc path A few naming/documentation comments so far. src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 1093: > 1091: Compile::TracePhase tp(Phase::_t_testTimer1); > 1092: Arena ar(MemTag::mtCompiler, Arena::Tag::tag_reglive); > 1093: ar.Amalloc(2 * M); // phase-local peak, should show up at MY-TESTPHASE-2 The reference to `MY-TESTPHASE-2` seems obsolete. test/hotspot/jtreg/compiler/print/CompileCommandMemLimit.java line 105: > 103: // by phase end. So, in the phase timeline these 2MB must show up as "significant temporary peak". > 104: // In testPhase2, we allocate 32MB from resource area, which is leaked until the end of the compilation. This > 105: // means that these 32MB will show up as permanent memory increase in the phasetimeline. The references to `testPhase` seem obsolete, do you mean `Phase::_t_testTimer1` and `Phase::_t_testTimer2`? test/hotspot/jtreg/compiler/print/CompileCommandMemLimit.java line 105: > 103: // by phase end. So, in the phase timeline these 2MB must show up as "significant temporary peak". > 104: // In testPhase2, we allocate 32MB from resource area, which is leaked until the end of the compilation. This > 105: // means that these 32MB will show up as permanent memory increase in the phasetimeline. Suggestion: // means that these 32MB will show up as permanent memory increase in the phase timeline. test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java line 27: > 25: /* > 26: * @test id=c2 > 27: * @summary Checks that -XX:CompileCommand=PrintMemStat,... works Suggestion: * @summary Checks that -XX:CompileCommand=MemStat,... works test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java line 35: > 33: /* > 34: * @test id=c1 > 35: * @summary Checks that -XX:CompileCommand=PrintMemStat,... works Suggestion: * @summary Checks that -XX:CompileCommand=MemStat,... works test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java line 86: > 84: oa.reportDiagnosticSummary(); > 85: > 86: // We expect two printouts for "PrintMemStat". A line at compilation time, and a line in a summary report Suggestion: // We expect two printouts for "MemStat". A line at compilation time, and a line in a summary report ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23530#pullrequestreview-2636581034 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1967354177 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1967350305 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1967355138 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1967358896 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1967357706 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1967359828 From dholmes at openjdk.org Mon Feb 24 11:28:01 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 24 Feb 2025 11:28:01 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v4] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 18 Feb 2025 09:20:57 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker Nothing further from me. Thanks ------------- PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2636773637 From adinn at openjdk.org Mon Feb 24 11:50:55 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 24 Feb 2025 11:50:55 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v6] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 17:33:18 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with four additional commits since the last revision: > > - Accepting suggested change from Andrew Dinn > - Added comments suggested by Andrew Dinn > - Fixed copyright years > - renaming a couple of functions src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4593: > 4591: // chunks of) vector registers v30 and v31, resp. > 4592: // The inputs are in v0-v7 and v16-v23 and the results go to v16-v23, > 4593: // four 32-bit values in each register Suggestion: Once again it would be good to annotate the lines in this code with comments that relate the generated code back to the original Java code. In the header comment you should refer to the relevant Java class and the var names there: // computes (in parallel across 8 x 4S vectors) // a = b * c * 2^-32 mod MONT_Q // where // inputs b and c are in v0, ..., v7 and v16, ... v23, // scratch registers v24, ... v27 are clobbered // output a is written back into v16, ... v23 // constants q and q_inv are in v30, v31 // // See the equivalent Java code in method ML_DSA.montMul Then comment the generation lines as shown below ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967490923 From adinn at openjdk.org Mon Feb 24 11:53:55 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 24 Feb 2025 11:53:55 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v6] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 17:33:18 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with four additional commits since the last revision: > > - Accepting suggested change from Andrew Dinn > - Added comments suggested by Andrew Dinn > - Fixed copyright years > - renaming a couple of functions src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4604: > 4602: FloatRegister vr7 = by_constant ? v29 : v7; > 4603: > 4604: __ sqdmulh(v24, __ T4S, vr0, v16); + __ sqdmulh(v24, __ T4S, v0, v16); // aHigh = hi32(2 * b * c) + __ mulv(v16, __ T4S, v0, v16); // aLow = lo32(b * c) src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4613: > 4611: __ mulv(v19, __ T4S, vr3, v19); > 4612: > 4613: __ mulv(v16, __ T4S, v16, v30); __ mulv(v16, __ T4S, v16, v30); // m = aLow * qinv src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4618: > 4616: __ mulv(v19, __ T4S, v19, v30); > 4617: > 4618: __ sqdmulh(v16, __ T4S, v16, v31); __ sqdmulh(v16, __ T4S, v16, v31); // n = hi32(2 * m * q) src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4623: > 4621: __ sqdmulh(v19, __ T4S, v19, v31); > 4622: > 4623: __ shsubv(v16, __ T4S, v24, v16); __ shsubv(v16, __ T4S, v24, v16); // a = (aHigh - n) / 2 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967491928 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967492635 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967493031 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967493643 From eosterlund at openjdk.org Mon Feb 24 12:07:06 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 24 Feb 2025 12:07:06 GMT Subject: Integrated: 8347335: ZGC: Use limitless mark stack memory In-Reply-To: <9h8RYyi02b9Hz6EGoef3tCHmAHYpB8bdgyUiXkZeC0s=.25738b1a-f580-4c1a-a9cc-f76dbc03bc1e@github.com> References: <9h8RYyi02b9Hz6EGoef3tCHmAHYpB8bdgyUiXkZeC0s=.25738b1a-f580-4c1a-a9cc-f76dbc03bc1e@github.com> Message-ID: On Tue, 11 Feb 2025 20:34:31 GMT, Erik ?sterlund wrote: > When ZGC performs marking, a lock-free data structure is used to keep track of objects that still need to be traced in the object traversal. This lock-free data structure uses versioned pointer as a technique to avoid ABA problems, prevalent when writing lock-free data structures. This required partitioning pointers in the structure to embed both a version and a location. > > Due to the reduced addressability of locations with only a portion of the pointer bits, a special memory space was created to manage the data structure such that offsets could be encoded, instead of addresses. > > Since the memory area needs to be contiguous, the JVM needs to know what the expected maximum size of this space will ever be, within some limiting bounds. That is what `-XX:ZMarkStackSpaceLimit` controls. > > While this strategy has worked well in practice, the design does limit the scalability of ZGC, due to limits in how much contiguous memory can be encoded with a subset of the pointer bits. Not to mention that users have no idea what number to put in to this JVM option. > > The `-XX:ZMarkStackSpaceLimit` JVM option is needed due to using a contiguous allocator to solve an ABA problem in a lock-free data structure. By selecting another solution for the ABA problem, the need for the special contiguous memory allocator and hence the JVM option can be removed. > > This PR proposes a new solution for that original ABA problem in the lock-free data structure, which renders the entire machinery behind the `-XX:ZMarkStackSpaceLimit` JVM option redundant. The proposed technique is to use hazard pointers instead. > > The use of hazard pointers is a well established safe memory reclamation (SMR) technique for writing lock-free data structures, that we also use in the Threads list. The main idea is to publish what pointer has been read with a hazard pointer, so that concurrent threads know not to free memory that is being concurrently used. Freeing of such racingly accessed memory is deferred until it is safe, hence solving the ABA problem. This also allows using plain malloc/free instead of a custom contiguous memory allocator for these structures. > > Only popping nodes from the mark stacks requires hazard pointers, and only GC workers pop entries from the mark stacks. Therefore, hazard pointers may be stored in a per-worker variable. > > I have measured throughput, latency, marking times and memory usage across a number of programs and platforms, and not seen any interesting changes in the behavior, ot... This pull request has now been integrated. Changeset: 65f79c14 Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/65f79c145b7b1b32ed064a37ad4d2b6aca935a4c Stats: 1192 lines in 26 files changed: 422 ins; 658 del; 112 mod 8347335: ZGC: Use limitless mark stack memory Reviewed-by: aboldtch, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/23571 From bkilambi at openjdk.org Mon Feb 24 12:14:03 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 24 Feb 2025 12:14:03 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 operations Message-ID: This patch adds aarch64 backend for scalar FP16 operations namely - add, subtract, multiply, divide, fma, sqrt, min and max. ------------- Commit messages: - 8345125: Aarch64: Add aarch64 backend for Float16 operations Changes: https://git.openjdk.org/jdk/pull/23748/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23748&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345125 Stats: 1007 lines in 13 files changed: 326 ins; 1 del; 680 mod Patch: https://git.openjdk.org/jdk/pull/23748.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23748/head:pull/23748 PR: https://git.openjdk.org/jdk/pull/23748 From azafari at openjdk.org Mon Feb 24 12:45:51 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Mon, 24 Feb 2025 12:45:51 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v30] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. > - Adding a runtime flag for selecting the old or new version can be added later. > - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: test file got back, fixed coding style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20425/files - new: https://git.openjdk.org/jdk/pull/20425/files/0a61fec3..5f4bc6dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=28-29 Stats: 594 lines in 2 files changed: 593 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From aph at openjdk.org Mon Feb 24 12:54:53 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 24 Feb 2025 12:54:53 GMT Subject: RFR: 8350483: AArch64: turn on signum intrinsics by default on Ampere CPUs In-Reply-To: References: Message-ID: On Sat, 22 Feb 2025 15:27:41 GMT, Patrick Zhang wrote: > Set -XX:+UseSignumIntrinsic by default for Ampere CPUs. It is to fix performance problem found on JMH cases `vm.compiler.Signum|java.lang.*MathBench.sig[nN]um*` where fmov is used to transmit data between GPRs and FPRs with significant time cost. > > Verified on Ampere-1A and found the scores (thrpt, ops/s) of `java.lang.*MathBench.sig[nN]um*` improved 40~50%, while `vm.compiler.Signum._1_signumFloatTest` and `vm.compiler.Signum._3_signumDoubleTest` results gained exponential increases. Also passed GHA sanity checks, and Jtreg tier1 on Ampere-1A as function-wise smoke tests. Looks good. I'm looking at Apple M1 which shows no regression with signum intrinsics, although the reliability of JMH benchmarks once we are in the 1 ns range is not great. So, this is a benefit on Arm and Ampere, and no worse on Apple. Maybe we should think about removing the `UseSignumIntrinsic` flag altogether. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23735#pullrequestreview-2636987299 From roland at openjdk.org Mon Feb 24 12:54:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 24 Feb 2025 12:54:56 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v3] In-Reply-To: References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> <5hd7BMjze01r6SZOvQ_Ogf_XV1UekB_mYQbpR5_Wzms=.a911ee76-094f-477c-8d24-564c4f0c39d3@github.com> Message-ID: On Thu, 20 Feb 2025 09:44:16 GMT, Roland Westrelin wrote: >>> Do you see any better way than having the 2x code size if we need both a slow and fast loop? >> >> No but I was confused by your comment about 3x and 4x which is why I asked for clarification. >> Compiled code size affects inlining decisions: if a callee has compiled code and it's larger than some threshold, then the callee is considered too expensive to inline. With your change, some method that was considered ok to inline could now be considered too big. I think that's what Vladimir is concerned by. I don't see what you can do about it, this said. > >> @rwestrel I think I had tried some verifications above, but I could not even get it to work in all cases in `SuperWord`. >> >> In `VLoop::check_preconditions_helper`, I try to find either the predicate or the multiversioning if. But I cannot always find it, and I think that one reason was that the pre-loop can be lost. At least that is what I remember from 4+ weeks ago. > > Do you understand when that happens? It doesn't feel right that the pre loop can be lost. > @rwestrel Do you want me to find examples for the pre-loop disappearing? I suppose I can find some easily by adding an assert in SuperWord, where we bail out, as I showed above. Yes, if not too much work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2678332801 From aph at openjdk.org Mon Feb 24 12:54:54 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 24 Feb 2025 12:54:54 GMT Subject: RFR: 8350483: AArch64: turn on signum intrinsics by default on Ampere CPUs In-Reply-To: References: Message-ID: <7XQsAZxrIwrsL3gPazBVzWnfQmfH3R6Xwnadg-9Jd34=.34b8e435-1d9f-4486-948e-70079238e3fd@github.com> On Mon, 24 Feb 2025 12:50:21 GMT, Andrew Haley wrote: > Maybe we should think about removing the `UseSignumIntrinsic` flag altogether. Ah, the flag is also used by other ports. But it doesn't make much sense for us not to use the intrinsic. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23735#issuecomment-2678331572 From coleenp at openjdk.org Mon Feb 24 14:31:20 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 24 Feb 2025 14:31:20 GMT Subject: RFR: 8328473: StringTable and SymbolTable statistics delay time to safepoint Message-ID: This change adds a safepoint poll to gathering statistics for the Symbol and String tables, using the ConcurrentHashTableTasks to chunk up the walk. The stringTable and symbolTable is similar, like the GrowTask and DeleteTask code. Maybe this can be cleaned up but I don't have a good idea about that yet that doesn't involve yet another level of templated functions and code. This is already pretty highly templatized. Tested with tier1-4 and runThese internal test with JFR and failure injection to verify that we do try to safepoint while gathering statistics. ------------- Commit messages: - 8328473: StringTable and SymbolTable statistics delay time to safepoint Changes: https://git.openjdk.org/jdk/pull/23750/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23750&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8328473 Stats: 163 lines in 5 files changed: 94 ins; 41 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/23750.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23750/head:pull/23750 PR: https://git.openjdk.org/jdk/pull/23750 From epeter at openjdk.org Mon Feb 24 14:32:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Feb 2025 14:32:57 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v3] In-Reply-To: References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> <5hd7BMjze01r6SZOvQ_Ogf_XV1UekB_mYQbpR5_Wzms=.a911ee76-094f-477c-8d24-564c4f0c39d3@github.com> Message-ID: On Mon, 24 Feb 2025 12:52:42 GMT, Roland Westrelin wrote: > > @rwestrel Do you want me to find examples for the pre-loop disappearing? I suppose I can find some easily by adding an assert in SuperWord, where we bail out, as I showed above. > > Yes, if not too much work. Ok, let's add this: diff --git a/src/hotspot/share/opto/vectorization.cpp b/src/hotspot/share/opto/vectorization.cpp index e607a1065dd..290ee249a42 100644 --- a/src/hotspot/share/opto/vectorization.cpp +++ b/src/hotspot/share/opto/vectorization.cpp @@ -98,6 +98,7 @@ VStatus VLoop::check_preconditions_helper() { // the pre-loop limit. CountedLoopEndNode* pre_end = _cl->find_pre_loop_end(); if (pre_end == nullptr) { + assert(false, "found no pre-loop"); return VStatus::make_failure(VLoop::FAILURE_PRE_LOOP_LIMIT); } Node* pre_opaq1 = pre_end->limit(); And run that: rr /oracle-work/jdk-fork7/build/linux-x64-slowdebug/jdk/bin/java -Xcomp -XX:+TraceLoopOpts -XX:CompileCommand=compileonly,jdk.internal.classfile.impl.StackMapGenerator::processBlock --version .... PreMainPost Loop: N7127/N4014 limit_check profile_predicated predicated counted [0,int),+1 (2147483648 iters) rc has_sfpt strip_mined Unroll 2 Loop: N7127/N4014 counted [int,int),+1 (2147483648 iters) main rc has_sfpt strip_mined Loop: N0/N0 has_call has_sfpt Loop: N7453/N7460 limit_check profile_predicated predicated counted [0,int),+1 (4 iters) pre rc has_sfpt Loop: N7126/N7125 sfpts={ 7128 } Loop: N7508/N4014 counted [int,int),+2 (2147483648 iters) main rc has_sfpt strip_mined Loop: N7409/N7416 counted [int,int),+1 (4 iters) post rc has_sfpt Parallel IV: 7728 Loop: N7453/N7460 limit_check profile_predicated predicated counted [0,int),+1 (4 iters) pre has_sfpt Parallel IV: 7725 Loop: N7508/N4014 counted [int,int),+2 (2147483648 iters) main has_sfpt strip_mined Parallel IV: 7718 Loop: N7409/N7416 counted [int,int),+1 (4 iters) post has_sfpt Loop: N0/N0 has_call has_sfpt Loop: N7453/N7460 limit_check profile_predicated predicated counted [0,int),+1 (4 iters) pre has_sfpt Loop: N7126/N7125 sfpts={ 7128 } Loop: N7508/N4014 counted [int,int),+2 (2147483648 iters) main has_sfpt strip_mined Loop: N7409/N7416 counted [int,int),+1 (4 iters) post has_sfpt RangeCheck Loop: N7508/N4014 counted [int,int),+2 (2147483648 iters) main has_sfpt rce strip_mined Unroll 4 Loop: N7508/N4014 limit_check counted [int,int),+2 (2147483648 iters) main has_sfpt rce strip_mined Loop: N0/N0 has_call has_sfpt Loop: N7453/N7460 limit_check profile_predicated predicated counted [0,int),+1 (4 iters) pre rc has_sfpt Loop: N7126/N7125 limit_check sfpts={ 7128 } Loop: N8146/N4014 limit_check counted [int,int),+4 (2147483648 iters) main has_sfpt strip_mined Loop: N7409/N7416 counted [int,int),+1 (4 iters) post rc has_sfpt ... # Internal Error (/oracle-work/jdk-fork7/open/src/hotspot/share/opto/vectorization.cpp:101), pid=1381339, tid=1381348 # assert(false) failed: found no pre-loop The pre-loop node is not dead actually. The issue is with the main-loop in `CountedLoopNode::is_canonical_loop_entry`. We skip through some predicates, but then we cannot find the ZeroTripGuard, rather I'm seeing this: (rr) p ctrl->dump_bfs(2,0,"#cd") dist dump --------------------------------------------- 2 974 ConI === 0 [[ ... ]] #int:1 2 8060 IfTrue === 8056 [[ 8073 ]] #1 1 8073 If === 8060 974 [[ 8074 8077 ]] #Last Value Assertion Predicate P=0.999999, C=-1.000000 0 8077 IfTrue === 8073 [[ 8103 ]] #1 The pre-loop is further up though: (rr) p this->dump_bfs(26,0,"#c") dist dump --------------------------------------------- 26 7453 CountedLoop === 7453 4015 7460 [[ 7452 7453 7454 7455 ]] inner stride: 1 pre of N7127 !orig=[7127],[7118],[2645] !jvms: StackMapGenerator::processBlock @ bci:2677 (line 671) 25 7455 If === 7453 7441 [[ 7456 7464 ]] P=0.000001, C=-1.000000 !orig=[2686] !jvms: StackMapGenerator$Frame::popStack @ bci:5 (line 1001) StackMapGenerator::processBlock @ bci:2681 (line 671) 24 7456 IfFalse === 7455 [[ 7448 7457 ]] #0 !orig=[2631],[2628] !jvms: StackMapGenerator$Frame::popStack @ bci:5 (line 1001) StackMapGenerator::processBlock @ bci:2681 (line 671) 23 7457 RangeCheck === 7456 7446 [[ 7458 7467 ]] P=0.999999, C=-1.000000 !orig=[1189] !jvms: StackMapGenerator$Frame::popStack @ bci:33 (line 1002) StackMapGenerator::processBlock @ bci:2681 (line 671) 22 7458 IfTrue === 7457 [[ 7459 ]] #1 !orig=[777],385 !jvms: StackMapGenerator$Frame::popStack @ bci:33 (line 1002) StackMapGenerator::processBlock @ bci:2681 (line 671) 21 7459 CountedLoopEnd === 7458 7443 [[ 7460 7482 ]] [lt] P=0.900000, C=-1.000000 !orig=7122,[5398] !jvms: StackMapGenerator::processBlock @ bci:2674 (line 670) 20 7482 IfFalse === 7459 [[ 7486 ]] #0 19 7486 If === 7482 7485 [[ 7461 7487 ]] P=0.999999, C=-1.000000 18 7487 IfTrue === 7486 [[ 7977 ]] #1 17 7977 If === 7487 974 [[ 7978 7981 ]] #Init Value Assertion Predicate P=0.999999, C=-1.000000 16 7981 IfTrue === 7977 [[ 7994 ]] #1 15 7994 If === 7981 974 [[ 7995 7998 ]] #Last Value Assertion Predicate P=0.999999, C=-1.000000 14 7998 IfTrue === 7994 [[ 8118 ]] #1 13 8118 If === 7998 8117 [[ 8119 8122 ]] #Last Value Assertion Predicate P=0.999999, C=-1.000000 12 8122 IfTrue === 8118 [[ 8007 ]] #1 11 8007 If === 8122 8006 [[ 8008 8011 ]] #Init Value Assertion Predicate P=0.999999, C=-1.000000 10 8011 IfTrue === 8007 [[ 8056 ]] #1 9 8056 If === 8011 974 [[ 8057 8060 ]] #Init Value Assertion Predicate P=0.999999, C=-1.000000 8 8060 IfTrue === 8056 [[ 8073 ]] #1 7 8073 If === 8060 974 [[ 8074 8077 ]] #Last Value Assertion Predicate P=0.999999, C=-1.000000 6 8077 IfTrue === 8073 [[ 8103 ]] #1 5 8173 IfFalse === 7122 [[ 7128 7129 ]] #0 !orig=[7524],[7123],[5442] !jvms: StackMapGenerator::processBlock @ bci:2674 (line 670) 5 8103 If === 8077 8102 [[ 8104 8107 ]] #Last Value Assertion Predicate P=0.999999, C=-1.000000 4 7128 SafePoint === 8173 1 778 1 1 7129 780 1 1 781 781 782 783 784 1 1 1 785 786 [[ 7124 ]] SafePoint !orig=385 !jvms: StackMapGenerator::processBlock @ bci:2688 (line 670) 4 8107 IfTrue === 8103 [[ 8086 ]] #1 3 7124 OuterStripMinedLoopEnd === 7128 781 [[ 7125 7471 ]] P=0.900000, C=-1.000000 3 8086 If === 8107 8085 [[ 8087 8090 ]] #Init Value Assertion Predicate P=0.999999, C=-1.000000 2 7122 CountedLoopEnd === 8146 7121 [[ 8173 4014 ]] [lt] P=0.900000, C=-1.000000 !orig=[5398] !jvms: StackMapGenerator::processBlock @ bci:2674 (line 670) 2 7125 IfTrue === 7124 [[ 7126 ]] #1 2 8090 IfTrue === 8086 [[ 7126 ]] #1 1 4014 IfTrue === 7122 [[ 8146 ]] #1 !jvms: StackMapGenerator::processBlock @ bci:2674 (line 670) 1 7126 OuterStripMinedLoop === 7126 8090 7125 [[ 7126 8146 ]] 0 8146 CountedLoop === 8146 7126 4014 [[ 8146 1191 8157 8158 7122 7503 ]] inner stride: 4 main of N8146 strip mined !orig=[7508],[7127],[7118],[2645] !jvms: StackMapGenerator::processBlock @ bci:2677 (line 671) It looks like we are skipping some predicates, but not enough of them maybe? In `AssertionPredicates::find_entry` we see: - `8090 IfTrue === 8086 [[ 7126 ]] #1`: `is_predicate` returns `true`. - `8107 IfTrue === 8103 [[ 8086 ]] #1`: `is_predicate` returns `true`. - `8077 IfTrue === 8073 [[ 8103 ]] #1`: `is_predicate` returns `false`. The reason is that the assertion predicate Opaque nodes have already disappeared. I talked with @chhagedorn and he says that there are some "dying" initialized assertion predicates from unrolling that can be in the way. They would be cleaned out by IGVN later, and then we can see through. But at this point they are in the way and we cannot see through and find the ZeroTripGuard, the predicate iterator is not good enough yet. But @chhagedorn is working on that. https://bugs.openjdk.org/browse/JDK-8350579 The implication is that the ZeroTripGuard can be temporarily not be found, and so we cannot even find the pre-loop, and also not the multiversion-if. So I cannot really add an assert now. And who knows, there may be other blocking reasons on top of that. @rwestrel Does that make sense? What do you think we should do? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2678602660 From rcastanedalo at openjdk.org Mon Feb 24 14:51:07 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 24 Feb 2025 14:51:07 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v4] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 13:14:34 GMT, Thomas Stuefe wrote: >> Greetings, >> >> This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. >> >> Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. >> >> I wanted to track that information correctly and display it clearly in a way that is easy to understand. >> >> The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). >> >> The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. >> >> The statistic gives us two new forms of output: >> >> 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: >> >> >> Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: >> Phase Total ra node comp type index reglive regsplit cienv other >> none 1205512 155104 982984 33712 0 0 0 0 0 33712 >> parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 >> optimizer 916584 0 556416 0 0 0 0 0 0 360168 >> escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 >> connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 >> macroEliminate 196448 0 196448 0 0 0 0 0 0 0 >> iterGVN 327440 0 196368 131072 0 0 0 0 0 0 >> incrementalInline 3992816 0 3043704 62... > > Thomas Stuefe has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > avoid Thread::current in high traffic chunk alloc path I reviewed the C2-specific code and have a couple of comments, otherwise looks good. While reviewing, I found a few more C2 arenas that could be tagged for higher accuracy: - matcher states arena, - superword (auto-vectorizer) arenas, - `Compile::_Compile_types`, and - `OptoRegScheduling` liveness arena. Here is a patch that adds tags for these: https://github.com/openjdk/jdk/commit/d501bd8a674229904358fb168a9c347004efeea3. I think these changes are within the scope of this RFE, because the original changeset includes similar ones. If you agree, feel free to merge the patch into this RFE. src/hotspot/share/memory/arena.hpp line 99: > 97: FN(comp, C2 Compile arena) \ > 98: FN(type, C2 Type arena) \ > 99: FN(index, C2 Index arena) \ `tag_index` is not used and can be removed, it seems to be subsumed by `tag_reglive`. src/hotspot/share/opto/chaitin.cpp line 370: > 368: > 369: ResourceArea split_arena(mtCompiler, Arena::Tag::tag_reglive); // Arena for Split local resources > 370: ResourceArea live_arena(mtCompiler, Arena::Tag::tag_regsplit); // Arena for liveness & IFG info Suggestion: ResourceArea split_arena(mtCompiler, Arena::Tag::tag_regsplit); // Arena for Split local resources ResourceArea live_arena(mtCompiler, Arena::Tag::tag_reglive); // Arena for liveness & IFG info ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23530#pullrequestreview-2637314247 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1967765880 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1967761886 From adinn at openjdk.org Mon Feb 24 14:58:57 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 24 Feb 2025 14:58:57 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v6] In-Reply-To: References: Message-ID: <_ApJlty8yCwyY8FiRhczpoKGf1G83hvMuXvOWeKHb90=.5758138f-b03b-49be-ab7a-3b4b56cbe7a6@github.com> On Thu, 20 Feb 2025 17:33:18 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with four additional commits since the last revision: > > - Accepting suggested change from Andrew Dinn > - Added comments suggested by Andrew Dinn > - Fixed copyright years > - renaming a couple of functions src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4654: > 4652: > 4653: void dilithium_add_sub32() { > 4654: __ addv(v24, __ T4S, v0, v16); __ addv(v24, __ T4S, v0, v16); // a0 = b + c src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4663: > 4661: __ addv(v31, __ T4S, v7, v23); > 4662: > 4663: __ subv(v0, __ T4S, v0, v16); __ subv(v0, __ T4S, v0, v16); // a1 = b - c src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4674: > 4672: > 4673: void dilithium_montmul_sub_add16() { > 4674: __ sqdmulh(v24, __ T4S, v1, v16); __ mulv(v16, __ T4S, v16, v30); // m = aLow * qinv ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967809436 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967809840 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967811299 From cnorrbin at openjdk.org Mon Feb 24 15:20:55 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Mon, 24 Feb 2025 15:20:55 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow In-Reply-To: References: Message-ID: On Sun, 23 Feb 2025 09:59:51 GMT, Kim Barrett wrote: > > I don't see where we check the return value of align_up_or_min for the changes in src/hotspot/share/gc/shared/gcArguments.cpp. If tests fail because of align_up, maybe the test should be fixed? > > I share @dean-long concerns. This PR doesn't seem like the right direction. I think the earlier PR was correct, and just uncovered problems elsewhere that should be fixed. For `gcArguments.cpp`, we only check in debug builds with `DEBUG_ONLY(assert_flags();)`. I thought (wrongly) that we always checked. I'll look further into the tests related to that. Besides that, I'm curious to hear what you feel is the "right direction". The other 3 cases here all account for the potential overflow. If it is possible to recover, I don't see the need to assert and crash. All other ~170 uses of `align_up` do crash, as they don't account for the overflow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23711#issuecomment-2678775577 From epeter at openjdk.org Mon Feb 24 15:30:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 24 Feb 2025 15:30:07 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v3] In-Reply-To: References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> <5hd7BMjze01r6SZOvQ_Ogf_XV1UekB_mYQbpR5_Wzms=.a911ee76-094f-477c-8d24-564c4f0c39d3@github.com> Message-ID: On Mon, 24 Feb 2025 12:52:42 GMT, Roland Westrelin wrote: >>> @rwestrel I think I had tried some verifications above, but I could not even get it to work in all cases in `SuperWord`. >>> >>> In `VLoop::check_preconditions_helper`, I try to find either the predicate or the multiversioning if. But I cannot always find it, and I think that one reason was that the pre-loop can be lost. At least that is what I remember from 4+ weeks ago. >> >> Do you understand when that happens? It doesn't feel right that the pre loop can be lost. > >> @rwestrel Do you want me to find examples for the pre-loop disappearing? I suppose I can find some easily by adding an assert in SuperWord, where we bail out, as I showed above. > > Yes, if not too much work. @rwestrel I think we should just file an RFE to keep track of these assertions we would like to add once those issues are fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2678803600 From adinn at openjdk.org Mon Feb 24 15:33:08 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 24 Feb 2025 15:33:08 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v6] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 17:33:18 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with four additional commits since the last revision: > > - Accepting suggested change from Andrew Dinn > - Added comments suggested by Andrew Dinn > - Fixed copyright years > - renaming a couple of functions src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4683: > 4681: __ mulv(v19, __ T4S, v7, v19); > 4682: > 4683: __ mulv(v16, __ T4S, v16, v30); __ mulv(v16, __ T4S, v16, v30); // m = aLow * qinv src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4688: > 4686: __ mulv(v19, __ T4S, v19, v30); > 4687: > 4688: __ sqdmulh(v16, __ T4S, v16, v31); __ sqdmulh(v16, __ T4S, v16, v31); // n = hi32(2 * m * q) src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4693: > 4691: __ sqdmulh(v19, __ T4S, v19, v31); > 4692: > 4693: __ shsubv(v16, __ T4S, v24, v16); __ shsubv(v16, __ T4S, v24, v16); // a = (aHigh - n) / 2 src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4698: > 4696: __ shsubv(v19, __ T4S, v27, v19); > 4697: > 4698: __ subv(v1, __ T4S, v0, v16); __ subv(v1, __ T4S, v0, v16); // x1 = x - a src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4703: > 4701: __ subv(v7, __ T4S, v6, v19); > 4702: > 4703: __ addv(v0, __ T4S, v0, v16); __ addv(v0, __ T4S, v0, v16); // x0 = x + a src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4742: > 4740: > 4741: for (int i = 0; i < 4; i++) { > 4742: __ ldpq(v30, v31, Address(dilithiumConsts, 0)); __ ldpq(v30, v31, Address(dilithiumConsts, 0)); // qinv, q src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4813: > 4811: // level 5 > 4812: for (int i = 0; i < 1024; i += 256) { > 4813: __ ldpq(v30, v31, Address(dilithiumConsts, 0)); __ ldpq(v30, v31, Address(dilithiumConsts, 0)); // qinv, q src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4853: > 4851: // level 6 > 4852: for (int i = 0; i < 1024; i += 128) { > 4853: __ ldpq(v30, v31, Address(dilithiumConsts, 0)); __ ldpq(v30, v31, Address(dilithiumConsts, 0)); // qinv, q src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4876: > 4874: // level 7 > 4875: for (int i = 0; i < 1024; i += 128) { > 4876: __ ldpq(v30, v31, Address(dilithiumConsts, 0)); __ ldpq(v30, v31, Address(dilithiumConsts, 0)); // qinv, q src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4905: > 4903: > 4904: void dilithium_sub_add_montmul16() { > 4905: __ subv(v20, __ T4S, v0, v1); __ subv(v20, __ T4S, v0, v1); // b = x0 - x1 src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4910: > 4908: __ subv(v23, __ T4S, v6, v7); > 4909: > 4910: __ addv(v0, __ T4S, v0, v1); __ addv(v0, __ T4S, v0, v1); // a0 = x0 + x1 src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4915: > 4913: __ addv(v6, __ T4S, v6, v7); > 4914: > 4915: __ sqdmulh(v24, __ T4S, v20, v16); __ sqdmulh(v24, __ T4S, v20, v16); // aHigh = hi32(2 * b * c) __ mulv(v1, __ T4S, v20, v16); // aLow = lo32(b * c) src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4924: > 4922: __ mulv(v7, __ T4S, v23, v19); > 4923: > 4924: __ mulv(v1, __ T4S, v1, v30); __ mulv(v1, __ T4S, v1, v30); // m = (aLow * q) src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4929: > 4927: __ mulv(v7, __ T4S, v7, v30); > 4928: > 4929: __ sqdmulh(v1, __ T4S, v1, v31); __ sqdmulh(v1, __ T4S, v1, v31); // n = hi32(2 * m * q) src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4934: > 4932: __ sqdmulh(v7, __ T4S, v7, v31); > 4933: > 4934: __ shsubv(v1, __ T4S, v24, v1); __ shsubv(v1, __ T4S, v24, v1); // a1 = (aHigh - n) / 2 src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5044: > 5042: // level0 > 5043: for (int i = 0; i < 1024; i += 128) { > 5044: __ ldpq(v30, v31, Address(dilithiumConsts, 0)); __ ldpq(v30, v31, Address(dilithiumConsts, 0)); //qinv, q src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5115: > 5113: __ str(v31, __ Q, Address(coeffs, i + 224)); > 5114: dilithium_load32zetas(zetas); > 5115: __ ldpq(v30, v31, Address(dilithiumConsts, 0)); __ ldpq(v30, v31, Address(dilithiumConsts, 0)); //qinv, q src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5166: > 5164: __ lea(dilithiumConsts, ExternalAddress((address) StubRoutines::aarch64::_dilithiumConsts)); > 5165: > 5166: __ ldpq(v30, v31, Address(dilithiumConsts, 0)); __ ldpq(v30, v31, Address(dilithiumConsts, 0)); // qinv, q __ ldr(v29, __ Q, Address(dilithiumConsts, 48)); // rsquare src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5228: > 5226: __ lea(dilithiumConsts, ExternalAddress((address) StubRoutines::aarch64::_dilithiumConsts)); > 5227: > 5228: __ ldpq(v30, v31, Address(dilithiumConsts, 0)); __ ldpq(v30, v31, Address(dilithiumConsts, 0)); // qinv, q ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967863821 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967864748 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967865658 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967866379 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967866822 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967867752 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967869143 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967870036 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967870373 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967871386 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967871949 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967872681 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967873281 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967873918 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967874418 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967875655 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967876745 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967877717 PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1967878884 From roland at openjdk.org Mon Feb 24 15:49:01 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 24 Feb 2025 15:49:01 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v3] In-Reply-To: References: <-9c7vyB-BTXBPy8qurDSvPUzcAv9LY_d8g8Xj5wnhi4=.7bac2991-37d1-40f5-be3e-bb7a9bdb9f26@github.com> <5hd7BMjze01r6SZOvQ_Ogf_XV1UekB_mYQbpR5_Wzms=.a911ee76-094f-477c-8d24-564c4f0c39d3@github.com> Message-ID: On Mon, 24 Feb 2025 12:52:42 GMT, Roland Westrelin wrote: >>> @rwestrel I think I had tried some verifications above, but I could not even get it to work in all cases in `SuperWord`. >>> >>> In `VLoop::check_preconditions_helper`, I try to find either the predicate or the multiversioning if. But I cannot always find it, and I think that one reason was that the pre-loop can be lost. At least that is what I remember from 4+ weeks ago. >> >> Do you understand when that happens? It doesn't feel right that the pre loop can be lost. > >> @rwestrel Do you want me to find examples for the pre-loop disappearing? I suppose I can find some easily by adding an assert in SuperWord, where we bail out, as I showed above. > > Yes, if not too much work. > @rwestrel I think we should just file an RFE to keep track of these assertions we would like to add once those issues are fixed. That sounds reasonable to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2678873056 From coleenp at openjdk.org Mon Feb 24 15:59:58 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 24 Feb 2025 15:59:58 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v7] In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 05:12:38 GMT, David Holmes wrote: > Does the SA not need any updates in relation to this? No, the SA doesn't know about these compiler intrinsics. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23572#issuecomment-2678913119 From coleenp at openjdk.org Mon Feb 24 15:59:59 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 24 Feb 2025 15:59:59 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v7] In-Reply-To: References: <_j9Wkg21aBltyVrbO4wxGFKmmLDy0T-eorRL4epfS4k=.5a453b6b-d673-4cc6-b29f-192fa74e290c@github.com> <3qpqR3PC8PFmdgaIoSYA3jDWdl-oon0-AcIzXcI76rY=.38635503-c067-4f6e-a4f1-92c1b6d991d1@github.com> Message-ID: <4eQr952WCBhGqlLqX0q2TCDLuFrwh_UmxgJcb2BOs_s=.8e7f55a7-60ec-4cc8-9a8b-cca84ccbba10@github.com> On Thu, 20 Feb 2025 23:23:08 GMT, Coleen Phillimore wrote: >>> ... but not in the return since the caller likely will fetch the klass pointer next. >> >> I notice that too. Callers are using is_primitive() to short-circuit calls to as_Klass(), which means they seem to be aware of this implementation detail when maybe they shouldn't. > > There are 70 callers so yes, it might be something that shouldn't be known in this many places. Definitely out of the scope of this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1967943222 From adinn at openjdk.org Mon Feb 24 16:21:58 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 24 Feb 2025 16:21:58 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v6] In-Reply-To: References: Message-ID: <6B25PDNMw8dDUm8r5rX4heL3cfvbsPVKqnVg7e1Ax84=.43b91704-15fa-4445-b8be-216fffcf12d4@github.com> On Thu, 20 Feb 2025 17:33:18 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with four additional commits since the last revision: > > - Accepting suggested change from Andrew Dinn > - Added comments suggested by Andrew Dinn > - Fixed copyright years > - renaming a couple of functions Please add comments as indicated to relate generated code to original Java source. Otherwise good to go. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23300#pullrequestreview-2637711807 From adinn at openjdk.org Mon Feb 24 16:21:59 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 24 Feb 2025 16:21:59 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: Message-ID: <36J5kPTCknNCBjMx56e9JmLK2vFbvxBXXXOvTmv5pDs=.6aaa25e2-4cd9-4217-8da3-3280c1d3c4db@github.com> On Fri, 21 Feb 2025 10:23:37 GMT, Ferenc Rakoczi wrote: >> Hi. Here is the test result of our CI. >> >> ### copyright year >> >> the following files should update the copyright year to 2025. >> >> >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp >> src/hotspot/cpu/aarch64/stubRoutines_aarch64.hpp >> src/hotspot/share/runtime/globals.hpp >> src/java.base/share/classes/sun/security/provider/ML_DSA.java >> src/java.base/share/classes/sun/security/provider/SHA3Parallel.java >> test/micro/org/openjdk/bench/java/security/MLDSA.java >> >> >> ### cross-build failure >> >> Cross build for riscv64/s390/ppc64 failed. >> >> Here shows the error msg for ppc64 >> >> >> === Output from failing command(s) repeated here === >> * For target support_interim-jmods_support__create_java.base.jmod_exec: >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/tmp/jdk-src/src/hotspot/share/asm/codeBuffer.hpp:200), pid=72752, tid=72769 >> # assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000e85cb03dc620 <= 0x0000e85cb03e8ab4 <= 0x0000e85cb03e8ab0 >> # >> # JRE version: OpenJDK Runtime Environment (25.0) (fastdebug build 25-internal-git-1e01c6deec3) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 25-internal-git-1e01c6deec3, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) >> # Problematic frame: >> # V [libjvm.so+0x3b391c] Instruction_aarch64::~Instruction_aarch64()+0xbc >> # >> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /tmp/ci-scripts/jdk-src/make/ >> # >> # An error report file with more information is saved as: >> # /tmp/jdk-src/make/hs_err_pid72752.log >> ... (rest of output omitted) >> >> * All command lines available in /sysroot/ppc64el/tmp/build-ppc64el/make-support/failure-logs. >> === End of repeated output === >> >> >> I suppose we should make the similar update at file `src/hotspot/cpu/aarch64/stubDeclarations_aarch64.hpp` to other platforms > > @shqking, I changed the copyright years, but I don't really understand how the aarch64-specific code can overflow buffers on other architectures. As far as I understand, Instruction_aarch64 should not have been there in a ppc build. > Was this a build attempted on an aarch64 for the other architectures? @ferakocz I have indicated a few places where I think you should add comments to clarify the relationship to the original Java code or just clarify what data is being used. I think the code is ok to go in as it is but I would really like to investigate a better structuring of the generator code. This can be done as a follow-up rather than delay getting this version committed. There are two things I still see as problematic with the current code. 1) There are lots of places in your auxiliary generator methods and also in their client methods where you generate distinct sequences of calls to the assembler sharing essentially the same code shape i.e. the same instructions but with different vector register arguments. For example, in `dilithium_montmul32` you generate the multiply sequence to montgomery multiply 4x4s registers in v0..v3 by 4x4s registers in v16..v19 and then repeat exactly the same code in exactly the same sequence to multiply the 4x4s registers in v4..v7 by 4x4s registers in v20..v23. Likewise, `dilithium_sub_add_montmul16` generates that same shape code but uses the montmul sequence with odd registers v1..v7 paired against the compact sequence v16..19. As another example, you generate various 4 or 8 long sequences of subv and addv operations at various points, including in some of the top level methods. I appreciate that you have folded one of the montmult cases into the other by adding the `bool by_constant` parameter to `dilithium_montmul32`. However, I think it would be worth investigating an alternative that would allow more use more, systematic use of auxiliary methods. 2) Your current auxiliary generator methods rely on a fixed mapping of input, output and scratch registers to specific registers. This is part of why the reason why you cannot always call your auxiliaries (or smaller pieces of them) from other locations where the same code shape is generated -- the input and output mappings of data to registers expected by the auxiliary do not match the register sequences in which the relevant data are (transiently) located. This same fact also means that the repeated code sections heavily depend on naming exactly the right register on each generator line. That makes it harder for a maintainer to recognize how, essentially, what is really just one common, abstract operation is, at each different occurrence, consuming, combining and updating several input sequences of related registers to generate one or more output sequences. That also means that it would be very easy to introduce an error if the code ever needed to be changed. I would like to investigate an alternative approach where your auxiliary generator methods and their callers pass arguments that identify the vector register sequences to be consumed as inputs, used as temporaries and written as outputs. In cases where the routines operate on sequences of 4 or 8 successive vectors then, at the very least, that would involve specifying the first register for each input, temporary or output e.g. for the montmult32 multiply v0+ by v16+ using v24+ as temporaries and v30+ as constants and output the results to v16+. However, that leaves it implicit that the first two inputs involve 8 registers while the temporaries involves 4 and the constants 2. The more general requirement is not just to specify the vector sequence length (2, 4 or 8) but also allow the default stride of one (e.g. v0, v1, ...) to be varied to allow for skip sequences (e.g. v0, v2, ...) or constant sequences (v28, v28, ... as would be needed for multiply constant). I have prototyped a simple vector sequence type `VRSeq` that models an indexable sequence of FloatRegisters and allows many of your higher level routines to simply declare register sets they operate on and then pass them as arguments to a range of simply auxiliary generator functions that can be used in many places where you currently have a lot of inline calls to the assembler -- see attachment: [vseq.zip](https://github.com/user-attachments/files/18946470/vseq.zip) I'll raise a JIRA to cover recoding the current implementation using this type and post a follow-up PR that uses it to see how far it helps simplify the code. I believe it will make it easier for maintainers to understand the structure of the generated code and observe/verify the use of registers to store specific values. It should also allow assertions about the use of registers to be added to the code to ensure that values are not being overwritten (expect in circumstances where that is legitimate). Meanwhile I'll approve this PR modulo the commenting I suggested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2678977770 From gziemski at openjdk.org Mon Feb 24 16:27:28 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 24 Feb 2025 16:27:28 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v52] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: code active only on macOS and Linux ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/da6d4997..3892ef16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=51 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=50-51 Stats: 8 lines in 2 files changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From adinn at openjdk.org Mon Feb 24 16:33:54 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 24 Feb 2025 16:33:54 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v6] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 17:33:18 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with four additional commits since the last revision: > > - Accepting suggested change from Andrew Dinn > - Added comments suggested by Andrew Dinn > - Fixed copyright years > - renaming a couple of functions I raised [JDK-8350589](https://bugs.openjdk.org/browse/JDK-8350589) to cover investigation of an alternative implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2679012108 From aph at openjdk.org Mon Feb 24 17:09:54 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 24 Feb 2025 17:09:54 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 12:09:57 GMT, Bhavana Kilambi wrote: > This patch adds aarch64 backend for scalar FP16 operations namely - add, subtract, multiply, divide, fma, sqrt, min and max. src/hotspot/cpu/aarch64/aarch64.ad line 17275: > 17273: > 17274: // This pattern would result in the following instructions (the first two are for ConvF2HF > 17275: // and the last instruction is for ReinterpretS2HF) - Suggestion: // Without this pattern, (ReinterpretS2HF (ConvF2HF src)) would result in the following instructions (the first two for ConvF2HF // and the last instruction for ReinterpretS2HF) - Reads a little better, I think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23748#discussion_r1968070079 From adinn at openjdk.org Mon Feb 24 17:15:59 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 24 Feb 2025 17:15:59 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v6] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 17:33:18 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with four additional commits since the last revision: > > - Accepting suggested change from Andrew Dinn > - Added comments suggested by Andrew Dinn > - Fixed copyright years > - renaming a couple of functions Marked as reviewed by adinn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23300#pullrequestreview-2637878768 From adinn at openjdk.org Mon Feb 24 17:16:00 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 24 Feb 2025 17:16:00 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: <1yB95sOajuS5ptFI0GQWLepii5JsZ9DOsje-TEFyFYs=.a325ad18-17ed-4e77-b1e3-0bad2cf55c67@github.com> Message-ID: On Thu, 20 Feb 2025 17:22:25 GMT, Ferenc Rakoczi wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2618: >> >>> 2616: INSN(smaxp, 0, 0b101001, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S >>> 2617: INSN(sminp, 0, 0b101011, false); // accepted arrangements: T8B, T16B, T4H, T8H, T2S, T4S >>> 2618: INSN(sqdmulh,0, 0b101101, false); // accepted arrangements: T4H, T8H, T2S, T4S >> >> Hi, not a comment on the algorithm itself but you might have to add these new instructions in the gtest for aarch64 here - test/hotspot/gtest/aarch64/aarch64-asmtest.py and use this file to generate test/hotspot/gtest/aarch64/asmtest.out.h which would contain these newly added instructions. > > I have tried that, but the python script (actually the as command that it started) threw error messages: > > aarch64ops.s:338:24: error: index must be a multiple of 8 in range [0, 32760]. > prfm PLDL1KEEP, [x15, 43] > ^ > aarch64ops.s:357:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] > sub x1, x10, x23, sxth #2 > ^ > aarch64ops.s:359:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] > add x11, x21, x5, uxtb #3 > ^ > aarch64ops.s:360:22: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] > adds x11, x17, x17, uxtw #1 > ^ > aarch64ops.s:361:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] > sub x11, x0, x15, uxtb #1 > ^ > aarch64ops.s:362:19: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] > subs x7, x1, x0, sxth #2 > ^ > This is without any modifications from what is in the master branch currently. @ferakocz This also really needs addressing before committing the patch. Perhaps @theRealAph can advise on how to circumvent the problems you found when trying to update the python script? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1968076559 From never at openjdk.org Mon Feb 24 17:31:06 2025 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 24 Feb 2025 17:31:06 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v3] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Wed, 19 Feb 2025 00:37:14 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > Stricter assertion on ppc64 src/hotspot/share/runtime/deoptimization.cpp line 650: > 648: // would need to get the size from the resolved method entry. Another exception would > 649: // be an invokedynamic with an adapter that is really a MethodHandle linker. > 650: caller_was_method_handle = true; This flag also controls the code at 711 that controls the computation of caller_adjustment. Is the new answer also correct for that code? This code might be a bit clearer if the computations of caller_was_method_handle, caller_adjustment and the new caller_actual_parameters were all closer together, though that might complicate a backport so maybe it should be deferred to some later cleanup. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1968100587 From aph at openjdk.org Mon Feb 24 17:31:52 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 24 Feb 2025 17:31:52 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 12:09:57 GMT, Bhavana Kilambi wrote: > This patch adds aarch64 backend for scalar FP16 operations namely - add, subtract, multiply, divide, fma, sqrt, min and max. src/hotspot/cpu/aarch64/aarch64.ad line 6978: > 6976: // ldr instruction has 32/64/128 bit variants but not a 16-bit variant. This > 6977: // loads the 16-bit value from constant pool into a 32-bit register but only > 6978: // the bottom half will be populated. Surely what actually happens here is that it loads a 32-bit word from the constant pool. The bottom 16 bits of this word contain the half-precision constant, the top 16 bits are zero. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23748#discussion_r1968101418 From bkilambi at openjdk.org Mon Feb 24 17:44:52 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 24 Feb 2025 17:44:52 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 17:28:43 GMT, Andrew Haley wrote: >> This patch adds aarch64 backend for scalar FP16 operations namely - add, subtract, multiply, divide, fma, sqrt, min and max. > > src/hotspot/cpu/aarch64/aarch64.ad line 6978: > >> 6976: // ldr instruction has 32/64/128 bit variants but not a 16-bit variant. This >> 6977: // loads the 16-bit value from constant pool into a 32-bit register but only >> 6978: // the bottom half will be populated. > > Surely what actually happens here is that it loads a 32-bit word from the constant pool. The bottom 16 bits of this word contain the half-precision constant, the top 16 bits are zero. I agree. The wording didn't quite convey that. I will change it in my next PS. Thank you for looking into the patch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23748#discussion_r1968120239 From gziemski at openjdk.org Mon Feb 24 17:46:37 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 24 Feb 2025 17:46:37 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v53] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: code active only on macOS and Linux ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/3892ef16..850ec167 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=52 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=51-52 Stats: 49 lines in 2 files changed: 44 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From liach at openjdk.org Mon Feb 24 17:52:00 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 24 Feb 2025 17:52:00 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v7] In-Reply-To: References: Message-ID: On Sat, 22 Feb 2025 14:49:38 GMT, Coleen Phillimore wrote: >> Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. >> Tested with tier1-4 and performance tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Use modifiers field directly in isInterface. The limited changes to the Java codebase looks reasonable. We should probably get a double check from Alan or some other architect. ------------- Marked as reviewed by liach (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23572#pullrequestreview-2637961573 From shade at openjdk.org Mon Feb 24 18:15:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Feb 2025 18:15:55 GMT Subject: RFR: 8328473: StringTable and SymbolTable statistics delay time to safepoint In-Reply-To: References: Message-ID: <20glKNAt7ZaAbKB31m8XqZwLDTzGdjXs4dXpax4D6iw=.3fb33f39-1e44-449a-97d2-d81202981a3c@github.com> On Mon, 24 Feb 2025 14:27:01 GMT, Coleen Phillimore wrote: > This change adds a safepoint poll to gathering statistics for the Symbol and String tables, using the ConcurrentHashTableTasks to chunk up the walk. The stringTable and symbolTable is similar, like the GrowTask and DeleteTask code. Maybe this can be cleaned up but I don't have a good idea about that yet that doesn't involve yet another level of templated functions and code. This is already pretty highly templatized. > Tested with tier1-4 and runThese internal test with JFR and failure injection to verify that we do try to safepoint while gathering statistics. Looks reasonable. Any perf data on TTSP under stress conditions? This would tell us if current claiming strategy goes well to mitigate TTSP overheads. I think something simple like "intern $M strings, do `while(true) System.gc()`, and then bash the process with `while true; do jcmd $PID VM.stringtable`; done" would do. (Should probably amend `VM_DumpHashtable::evaluate_at_safepoint() { return false; }` as well...) src/hotspot/share/utilities/concurrentHashTable.hpp line 59: > 57: } > 58: } > 59: // Calculate statistics. Item sizes are calculated with VALUE_SIZE_FUNC, and accumloated in summary and literal_size. "accumulated" ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23750#pullrequestreview-2637946179 PR Review Comment: https://git.openjdk.org/jdk/pull/23750#discussion_r1968120376 From coleenp at openjdk.org Mon Feb 24 18:41:28 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 24 Feb 2025 18:41:28 GMT Subject: RFR: 8328473: StringTable and SymbolTable statistics delay time to safepoint [v2] In-Reply-To: References: Message-ID: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> > This change adds a safepoint poll to gathering statistics for the Symbol and String tables, using the ConcurrentHashTableTasks to chunk up the walk. The stringTable and symbolTable is similar, like the GrowTask and DeleteTask code. Maybe this can be cleaned up but I don't have a good idea about that yet that doesn't involve yet another level of templated functions and code. This is already pretty highly templatized. > Tested with tier1-4 and runThese internal test with JFR and failure injection to verify that we do try to safepoint while gathering statistics. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fxi typo. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23750/files - new: https://git.openjdk.org/jdk/pull/23750/files/c5d089f3..82a9e247 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23750&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23750&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23750.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23750/head:pull/23750 PR: https://git.openjdk.org/jdk/pull/23750 From ccheung at openjdk.org Mon Feb 24 18:43:27 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 24 Feb 2025 18:43:27 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v6] In-Reply-To: References: Message-ID: > This changset refactors CDS class paths and module paths validation code into a new class `AOTCodeSource` and related class `AOTCodeSourceConfig`. Code has been moved from filemap.[c|h]pp, classLoader.[c|h]pp, and classLoaderExt.[c|h]pp to aotCodeSource.[c|h]pp. CDS dependencies have been removed from `classLoader.cpp`. More refactoring could be done, such as removing `classLoaderExt.cpp`, in a future RFE. > > Passed tiers 1 - 5 testing. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: @dholmes-ora comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23476/files - new: https://git.openjdk.org/jdk/pull/23476/files/9e4e33dd..70f3efb2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23476&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23476&range=04-05 Stats: 7 lines in 4 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23476/head:pull/23476 PR: https://git.openjdk.org/jdk/pull/23476 From ccheung at openjdk.org Mon Feb 24 18:43:28 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 24 Feb 2025 18:43:28 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v5] In-Reply-To: References: Message-ID: <7cqil6gxusrEIf8DOQJDncbJSbYd7uzgd0i92GX-EUY=.b446ab76-3e0a-4aab-8eb5-0e028a940a44@github.com> On Sun, 23 Feb 2025 23:52:59 GMT, David Holmes wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> rename classes and add vm_exit_during_initialization call > > src/hotspot/share/cds/aotClassLocation.cpp line 53: > >> 51: const AOTClassLocationConfig* AOTClassLocationConfig::_runtime_instance = nullptr; >> 52: >> 53: // A ClassLocationStream represents a list of code sources, which can be iterated using > > Suggestion: > > // A ClassLocationStream represents a list of code locations, which can be iterated using Fixed > src/hotspot/share/cds/aotClassLocation.cpp line 133: > >> 131: }; >> 132: >> 133: // AllClassLocationStreams is used to iterate over all the code sources that > > Suggestion: > > // AllClassLocationStreams is used to iterate over all the code locations that Fixed > src/hotspot/share/cds/aotClassLocation.hpp line 122: > >> 120: // AOTClassLocations (subjected to AOTClassLocationConfig::validate()). >> 121: // >> 122: // In general, validation is performed on the AOTClassLocations to ensure the code sources used > > Suggestion: > > // In general, validation is performed on the AOTClassLocations to ensure the code locations used Fixed > src/hotspot/share/classfile/classLoaderDataShared.cpp line 157: > >> 155: } >> 156: >> 157: void ClassLoaderDataShared::ensure_module_entry_table_exist(oop class_loader) { > > Suggestion: > > void ClassLoaderDataShared::ensure_module_entry_table_exists(oop class_loader) { > > Tables exist, but a single table exists. Fixed > src/hotspot/share/classfile/classLoaderDataShared.hpp line 37: > >> 35: class ClassLoaderDataShared : AllStatic { >> 36: static bool _full_module_graph_loaded; >> 37: static void ensure_module_entry_table_exist(oop class_loader); > > Suggestion: > > static void ensure_module_entry_table_exists(oop class_loader); Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1968212961 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1968213029 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1968212846 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1968213106 PR Review Comment: https://git.openjdk.org/jdk/pull/23476#discussion_r1968213193 From coleenp at openjdk.org Mon Feb 24 18:59:57 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 24 Feb 2025 18:59:57 GMT Subject: RFR: 8328473: StringTable and SymbolTable statistics delay time to safepoint [v2] In-Reply-To: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> References: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> Message-ID: <7UgnlkUw7VbP8PJ4LsOGHquIFklt1cbC0dY6wrsRZ4Q=.9919c9c4-1675-4fd2-ae73-b66c279711cb@github.com> On Mon, 24 Feb 2025 18:41:28 GMT, Coleen Phillimore wrote: >> This change adds a safepoint poll to gathering statistics for the Symbol and String tables, using the ConcurrentHashTableTasks to chunk up the walk. The stringTable and symbolTable is similar, like the GrowTask and DeleteTask code. Maybe this can be cleaned up but I don't have a good idea about that yet that doesn't involve yet another level of templated functions and code. This is already pretty highly templatized. >> Tested with tier1-4 and runThese internal test with JFR and failure injection to verify that we do try to safepoint while gathering statistics. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fxi typo. This patch doesn't change the behavior for the dcmd path, because that's done in a safepoint. I suppose it could be changed to not dump in a safepoint, but are there expectations that running jcmd doesn't block the process? This might be a good RFE though. This patch may improve the granularity of the global counter while dumping. This fix was for JFR sampling. Maybe a good measurement would be creating N interned strings, system.gc() in a counted loop and another thread doing JFR samples? In one of the microbenchmarks. Maybe the before/after times would be meaningful? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23750#issuecomment-2679382084 From rriggs at openjdk.org Mon Feb 24 19:10:02 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Mon, 24 Feb 2025 19:10:02 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v7] In-Reply-To: References: Message-ID: On Sat, 22 Feb 2025 14:49:38 GMT, Coleen Phillimore wrote: >> Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. >> Tested with tier1-4 and performance tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Use modifiers field directly in isInterface. A nice simplification. src/java.base/share/classes/java/lang/Class.java line 241: > 239: private Class(ClassLoader loader, Class arrayComponentType, char mods, ProtectionDomain pd, boolean isPrim) { > 240: // Initialize final field for classLoader. The initialization value of non-null > 241: // prevents future JIT optimizations from assuming this final field is null. To add a bit more depth to this comment, I'd add. "The following assignments are done directly by the VM without calling this constructor." Or something to that effect. ------------- Marked as reviewed by rriggs (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23572#pullrequestreview-2638174546 PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1968254793 From shade at openjdk.org Mon Feb 24 19:10:57 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Feb 2025 19:10:57 GMT Subject: RFR: 8328473: StringTable and SymbolTable statistics delay time to safepoint [v2] In-Reply-To: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> References: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> Message-ID: On Mon, 24 Feb 2025 18:41:28 GMT, Coleen Phillimore wrote: >> This change adds a safepoint poll to gathering statistics for the Symbol and String tables, using the ConcurrentHashTableTasks to chunk up the walk. The stringTable and symbolTable is similar, like the GrowTask and DeleteTask code. Maybe this can be cleaned up but I don't have a good idea about that yet that doesn't involve yet another level of templated functions and code. This is already pretty highly templatized. >> Tested with tier1-4 and runThese internal test with JFR and failure injection to verify that we do try to safepoint while gathering statistics. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fxi typo. Sure, if you can trigger JFR events often programmatically, that would also work. Otherwise, the experimental change in `VM_DumpHashtable` is okay to prove out the statistics dumping code. We can do in _properly_ in a separate PR, but a quick hack is okay for performance tests. I am really interested in TTSP figures from `-Xlog:safepoint` before/after this patch under heavy give-me-statistics requests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23750#issuecomment-2679404944 From coleenp at openjdk.org Mon Feb 24 19:22:57 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 24 Feb 2025 19:22:57 GMT Subject: RFR: 8328473: StringTable and SymbolTable statistics delay time to safepoint [v2] In-Reply-To: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> References: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> Message-ID: On Mon, 24 Feb 2025 18:41:28 GMT, Coleen Phillimore wrote: >> This change adds a safepoint poll to gathering statistics for the Symbol and String tables, using the ConcurrentHashTableTasks to chunk up the walk. The stringTable and symbolTable is similar, like the GrowTask and DeleteTask code. Maybe this can be cleaned up but I don't have a good idea about that yet that doesn't involve yet another level of templated functions and code. This is already pretty highly templatized. >> Tested with tier1-4 and runThese internal test with JFR and failure injection to verify that we do try to safepoint while gathering statistics. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fxi typo. Ok, I'll try your experiment then and get the -Xlog:safepoint results. Actually we have a runThese30M test that calls JFR a lot that I used to test this. Let me see if I can get meaningful before/after -Xlog:safepoint results. Then I'll write your experiment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23750#issuecomment-2679432457 From coleenp at openjdk.org Mon Feb 24 19:30:41 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 24 Feb 2025 19:30:41 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v7] In-Reply-To: References: Message-ID: <5i_vwoj0oivW08tMAX5Bp2m7yK_pgQOy0b7_MizQ-uM=.0f54046e-8972-4d05-89d6-aee42b079b48@github.com> On Sat, 22 Feb 2025 14:49:38 GMT, Coleen Phillimore wrote: >> Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. >> Tested with tier1-4 and performance tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Use modifiers field directly in isInterface. Thanks for reviewing Roger. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23572#issuecomment-2679447427 From coleenp at openjdk.org Mon Feb 24 19:30:41 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 24 Feb 2025 19:30:41 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v8] In-Reply-To: References: Message-ID: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add a comment about Class constructor. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23572/files - new: https://git.openjdk.org/jdk/pull/23572/files/db7c9782..591abdda Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23572&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23572&range=06-07 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23572.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23572/head:pull/23572 PR: https://git.openjdk.org/jdk/pull/23572 From coleenp at openjdk.org Mon Feb 24 19:30:41 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 24 Feb 2025 19:30:41 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v7] In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 19:06:30 GMT, Roger Riggs wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Use modifiers field directly in isInterface. > > src/java.base/share/classes/java/lang/Class.java line 241: > >> 239: private Class(ClassLoader loader, Class arrayComponentType, char mods, ProtectionDomain pd, boolean isPrim) { >> 240: // Initialize final field for classLoader. The initialization value of non-null >> 241: // prevents future JIT optimizations from assuming this final field is null. > > To add a bit more depth to this comment, I'd add. > > "The following assignments are done directly by the VM without calling this constructor." > Or something to that effect. Okay, that's a good comment. I'll add it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23572#discussion_r1968297499 From gziemski at openjdk.org Mon Feb 24 19:49:17 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 24 Feb 2025 19:49:17 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v54] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: code active only on macOS and Linux ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/850ec167..94335a85 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=53 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=52-53 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From iklam at openjdk.org Mon Feb 24 19:52:03 2025 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 24 Feb 2025 19:52:03 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v6] In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 18:43:27 GMT, Calvin Cheung wrote: >> This changset refactors CDS class paths and module paths validation code into a new class `AOTCodeSource` and related class `AOTCodeSourceConfig`. Code has been moved from filemap.[c|h]pp, classLoader.[c|h]pp, and classLoaderExt.[c|h]pp to aotCodeSource.[c|h]pp. CDS dependencies have been removed from `classLoader.cpp`. More refactoring could be done, such as removing `classLoaderExt.cpp`, in a future RFE. >> >> Passed tiers 1 - 5 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > @dholmes-ora comments Marked as reviewed by iklam (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23476#pullrequestreview-2638290020 From ccheung at openjdk.org Mon Feb 24 19:58:13 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 24 Feb 2025 19:58:13 GMT Subject: RFR: 8280682: Refactor AOT code source validation checks [v5] In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 00:04:21 GMT, David Holmes wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> rename classes and add vm_exit_during_initialization call > > A couple of minor suggestions, but otherwise nothing further from me. > > Thanks @dholmes-ora, @ashu-mehra, @vnkozlov, @iklam Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23476#issuecomment-2679500022 From ccheung at openjdk.org Mon Feb 24 19:58:14 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 24 Feb 2025 19:58:14 GMT Subject: Integrated: 8280682: Refactor AOT code source validation checks In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 22:32:58 GMT, Calvin Cheung wrote: > This changset refactors CDS class paths and module paths validation code into a new class `AOTCodeSource` and related class `AOTCodeSourceConfig`. Code has been moved from filemap.[c|h]pp, classLoader.[c|h]pp, and classLoaderExt.[c|h]pp to aotCodeSource.[c|h]pp. CDS dependencies have been removed from `classLoader.cpp`. More refactoring could be done, such as removing `classLoaderExt.cpp`, in a future RFE. > > Passed tiers 1 - 5 testing. This pull request has now been integrated. Changeset: ddb25691 Author: Calvin Cheung URL: https://git.openjdk.org/jdk/commit/ddb256911032cd7e6fae17c342261276066d8d25 Stats: 3153 lines in 40 files changed: 1359 ins; 1615 del; 179 mod 8280682: Refactor AOT code source validation checks Co-authored-by: Ioi Lam Reviewed-by: iklam, asmehra, dholmes, kvn ------------- PR: https://git.openjdk.org/jdk/pull/23476 From dlong at openjdk.org Mon Feb 24 21:09:57 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 24 Feb 2025 21:09:57 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v8] In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 19:30:41 GMT, Coleen Phillimore wrote: >> Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. >> Tested with tier1-4 and performance tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add a comment about Class constructor. Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23572#pullrequestreview-2638441924 From gziemski at openjdk.org Mon Feb 24 21:20:12 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 24 Feb 2025 21:20:12 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v30] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: <2fTUz_oDUK6cNc-DY5AATuXDaqjQZ6XUa-t1VF_UzaI=.6fb21f2f-eccc-4950-9a31-98f467be48e6@github.com> On Mon, 24 Feb 2025 12:45:51 GMT, Afshin Zafari wrote: >> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTrackerWithTree`. >> - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. >> - Both old and new versions exist in the code and can be selected via `MemTracker::set_version()` >> - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. >> - All tier1 tests pass except one ~that expects a 50% increase in committed memory but it does not happen~ https://bugs.openjdk.org/browse/JDK-8335167. >> - Adding a runtime flag for selecting the old or new version can be added later. >> - Some performance tests are added for new version, VMATree and Treap, to show the idea and should be improved later. Based on the results of comparing speed of VMATree and VMT, VMATree shows ~40x faster response time. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > test file got back, fixed coding style My feedback so far, more tomorrow. Is the description out of date? It says > Both old and new versions exist in the code and can be selected via MemTracker::set_version() but that's not true right? src/hotspot/share/nmt/virtualMemoryTracker.hpp line 381: > 379: bool remove_uncommitted_region (address base_addr, size_t size); > 380: bool remove_released_region (address base_addr, size_t size); > 381: bool remove_released_region (ReservedMemoryRegion* rgn); Why are they returning `bool` ? I don't see the return value being used anywhere? ------------- PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2637959728 PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2679666750 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1968340892 From gziemski at openjdk.org Mon Feb 24 21:20:13 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 24 Feb 2025 21:20:13 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v30] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Mon, 14 Oct 2024 08:52:50 GMT, Afshin Zafari wrote: >> src/hotspot/share/nmt/memReporter.cpp line 442: >> >>> 440: if (all_committed) { >>> 441: bool reserved_and_committed = false; >>> 442: VirtualMemoryTracker::Instance::tree()->visit_committed_regions(*reserved_rgn, >> >> Change the signature of `visit_committed_regions` to taking `(position start, size size)` instead of the `ReservedMemoryRegion`. > > Done. 2 questions: 1st, I must be misunderstanding something here. Johan asked to change the API from: `visit_committed_regions(ReservedMemoryRegion& committed_rgn)` to `visit_committed_regions(position start, size size)` but I still see the old way. 2nd, why are we asking for this change? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1968128713 From gziemski at openjdk.org Mon Feb 24 21:27:58 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 24 Feb 2025 21:27:58 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v55] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: fix build on arm linux ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/94335a85..4a7edefb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=54 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=53-54 Stats: 10 lines in 2 files changed: 2 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From dlong at openjdk.org Mon Feb 24 22:36:56 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 24 Feb 2025 22:36:56 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v3] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: <6I2PyXMG5jSH3dmfnmUvUOrtZ9ntwjkZEw2GqFFCsNg=.de6024a5-5c60-4842-afe5-3d878b65bb6c@github.com> On Mon, 24 Feb 2025 17:28:03 GMT, Tom Rodriguez wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> Stricter assertion on ppc64 > > src/hotspot/share/runtime/deoptimization.cpp line 650: > >> 648: // would need to get the size from the resolved method entry. Another exception would >> 649: // be an invokedynamic with an adapter that is really a MethodHandle linker. >> 650: caller_was_method_handle = true; > > This flag also controls the code at 711 that controls the computation of caller_adjustment. Is the new answer also correct for that code? > > This code might be a bit clearer if the computations of caller_was_method_handle, caller_adjustment and the new caller_actual_parameters were all closer together, though that might complicate a backport so maybe it should be deferred to some later cleanup. Yes, I have further cleanup that I want to do later, but I want to minimize changes in this one to simplify backports. Good catch about line 711. I left it in on purpose, again to simplify backports, but it could be safely removed. All it does here is over-estimate the adjustment, which is harmless. In future cleanups, I hope to make the adjustment exact rather than a possibly over-estimated increment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1968511371 From iklam at openjdk.org Tue Feb 25 00:23:55 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 25 Feb 2025 00:23:55 GMT Subject: RFR: 8330174: Protection zone for easier detection of accidental zero-nKlass use [v8] In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 07:08:35 GMT, Thomas Stuefe wrote: >> If we wrongly decode an nKlass of `0`, and the nKlass encoding base is not NULL (typical for most cases that run with CDS enabled), the resulting pointer points to the start of the Klass encoding range. That area is readable. If CDS is enabled, it will be at the start of the CDS metadata archive. If CDS is off, it is at the start of the class space. >> >> Now, both CDS and class space allocate a safety buffer at the start to prevent Klass structures from being located there. However, that memory is still readable, so we can read garbage data from that area. In the case of CDS, that area is just 16 bytes, after that come real data. Since Klass is large, most accesses will read beyond the 16-byte zone. >> >> We should protect the first page in the narrow Klass encoding range to make analysis of errors like this easier. Especially in release builds where decode_not_null does not assert. We already use a similar technique in the heap, and most OSes protect the zero page for the same reason. >> >> This patch does that. Now, decoding an `0` nKlass and then using the result `Klass` - calling virtual functions or accessing members - crashes right away. >> >> Additionally, the patch provides a helpful output in the register/stack section, e.g: >> >> >> RDI=0x0000000800000000 points into nKlass protection zone >> >> >> >> Testing: >> - GHAs. >> - I tested the patch manually on x64 Linux for both CDS on, CDS off and zero-based encoding, CDS off and non-zero-based encoding. >> - I tested manually on Windows x64 >> - I also prepared an automatic gtest, but that needs some preparatory work on the gtest suite first to work (see https://bugs.openjdk.org/browse/JDK-8348029) >> >> -- Update 2024-01-22 -- >> I added a jtreg test that is more thorough than a gtest (also scans the produced hs-err file) > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > remove test coding Marked as reviewed by iklam (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23190#pullrequestreview-2638783342 From kvn at openjdk.org Tue Feb 25 00:37:00 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Feb 2025 00:37:00 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: <9mXRl7rScxJwxNNlV_H1gxndtzZ6g-gE8cMsc6VsTJQ=.b5a77c13-6e7e-4203-898a-3318e298d30f@github.com> On Mon, 24 Feb 2025 08:00:24 GMT, Emanuel Peter wrote: > But if we do not optimize the slow path loop, then we would get performance regressions in aliasing cases because we have no unrolling for them any more. Okay, we are back to our previous conversation - we will wait your aliasing-analysis runtime-checks implementation and do performance runs to see if "slow" path affects performance. Okay. PS: "slow" path implies that it is not taking frequently and it should not affect general performance of application. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2680031423 From vaivanov at openjdk.org Tue Feb 25 01:08:53 2025 From: vaivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 25 Feb 2025 01:08:53 GMT Subject: RFR: 8350516: Update model numbers for ECore-based cpus In-Reply-To: References: Message-ID: On Fri, 21 Feb 2025 21:47:45 GMT, Volodymyr Paprotski wrote: > Add two more models to the list The PTL and CWF codes looks OK. ------------- Marked as reviewed by vaivanov (Author). PR Review: https://git.openjdk.org/jdk/pull/23731#pullrequestreview-2638845956 From iklam at openjdk.org Tue Feb 25 01:11:25 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 25 Feb 2025 01:11:25 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v8] In-Reply-To: References: Message-ID: > Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. > > With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. > > To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): > >> the format of the configuration and cache files is not specified and is subject to change without notice. > > **Notes for reviewers:** > > - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. > - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. > - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. > - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. > - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. > > **Misc Note** > - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccba6dcce4a3) will be integrated separ... Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - all tests in runtime/cds/appcds/aotClassLinking should be excluded for hotspot_appcds_dynamic testing - @ashu-mehra comment - simplified _archived_cpp_vtptrs; also fixed old comments near by - Merge branch 'master' into 8348426-binary-aot-config-file - Merge branch 'master' into 8348426-binary-aot-config-file - @ashu-mehra comments - @calvinccheung comments - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself - Improve error message when AOTMode=create has an incompatible classpath - Fixed test cases @vnkozlov - Update "make test JTREG_AOT_JDK=true ..." to work with binary AOT configuration - ... and 5 more: https://git.openjdk.org/jdk/compare/990d40e9...1ec67c11 ------------- Changes: https://git.openjdk.org/jdk/pull/23484/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23484&range=07 Stats: 1231 lines in 42 files changed: 1014 ins; 47 del; 170 mod Patch: https://git.openjdk.org/jdk/pull/23484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23484/head:pull/23484 PR: https://git.openjdk.org/jdk/pull/23484 From iklam at openjdk.org Tue Feb 25 01:17:04 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 25 Feb 2025 01:17:04 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v7] In-Reply-To: <2VgYzYhshPAAdh1bdBsJLvcN0kQ3X3NNeizoahDzsR0=.f778568d-bbfd-47ec-9d48-3969cb861cb5@github.com> References: <2VgYzYhshPAAdh1bdBsJLvcN0kQ3X3NNeizoahDzsR0=.f778568d-bbfd-47ec-9d48-3969cb861cb5@github.com> Message-ID: On Sun, 23 Feb 2025 17:24:21 GMT, Ashutosh Mehra wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Merge branch 'master' into 8348426-binary-aot-config-file >> - @ashu-mehra comments >> - @calvinccheung comments >> - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself >> - Improve error message when AOTMode=create has an incompatible classpath >> - Fixed test cases @vnkozlov >> - Update "make test JTREG_AOT_JDK=true ..." to work with binary AOT configuration >> - Fixed test failures >> - Added comments; fixed FIXMEs >> - Added more test cases >> - ... and 2 more: https://git.openjdk.org/jdk/compare/00d4e4a9...21f140e7 > > src/hotspot/share/cds/cppVtables.cpp line 200: > >> 198: // _orig_cpp_vtptrs[ConstantPool_Kind] == ((intptr_t**)cp)[0] >> 199: // >> 200: // _archived_cpp_vtptrs is a map of all the vptprs used by classes in a preimage. E.g., for > > Thanks for adding these comments. I think I now understand why we need `_archived_cpp_vtptrs`. > I am wondering if we really need to store this table in the preimage. > When the control enters `CppVtables::dumptime_init`, if we are dumping the final archive, then the `_index[kind].cloned_vtable()` would be pointing to the vtable in the preimage. So we can initialize the `_archived_cpp_vtptrs` at that time before `_index[kind]` is overwritten by the runtime vtable data. > Wouldn't that work? > > Something like this: > > ```@@ -231,13 +231,15 @@ char* CppVtables::_vtables_serialized_base = nullptr; > void CppVtables::dumptime_init(ArchiveBuilder* builder) { > assert(CDSConfig::is_dumping_static_archive(), "cpp tables are only dumped into static archive"); > > - CPP_VTABLE_TYPES_DO(ALLOCATE_AND_INITIALIZE_VTABLE); > - > - if (!CDSConfig::is_dumping_final_static_archive()) { > + // When dumping final archive, _index[kind] at this point is in the preimage. > + // Store the vtable pointers present in the preimage as _index[kind] will now be rewritten > + // to point to the runtime vtable data. > + if (CDSConfig::is_dumping_final_static_archive()) { > for (int kind = 0; kind < _num_cloned_vtable_kinds; kind++) { > _archived_cpp_vtptrs[kind] = _index[kind]->cloned_vtable(); > } > } > + CPP_VTABLE_TYPES_DO(ALLOCATE_AND_INITIALIZE_VTABLE); > > size_t cpp_tables_size = builder->rw_region()->top() - builder->rw_region()->base(); > builder->alloc_stats()->record_cpp_vtables((int)cpp_tables_size); > @@ -253,16 +255,6 @@ void CppVtables::serialize(SerializeClosure* soc) { > if (soc->reading()) { > CPP_VTABLE_TYPES_DO(INITIALIZE_VTABLE); > } > - > - if (soc->writing() && !CDSConfig::is_dumping_preimage_static_archive()) { > - // This table is written only when creating the preimage. It will be used > - // only when writing the final static archive. > - memset(_archived_cpp_vtptrs, 0, sizeof(_archived_cpp_vtptrs)); > - } > - > - for (int kind = 0; kind < _num_cloned_vtable_kinds; kind++) { > - soc->do_ptr(&_archived_cpp_vtptrs[kind]); > - } > } Thanks for the suggestion. I've incorporated your patch and added an assert. I also fixed `CppVtables::is_valid_shared_method()` to use `_archived_cpp_vtptrs` While fixing the code, I found some typos in near-by comments that are also unclear, so I rewrote the comment about `_vtables_serialized_base ` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1968658522 From never at openjdk.org Tue Feb 25 02:03:57 2025 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 25 Feb 2025 02:03:57 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v3] In-Reply-To: <6I2PyXMG5jSH3dmfnmUvUOrtZ9ntwjkZEw2GqFFCsNg=.de6024a5-5c60-4842-afe5-3d878b65bb6c@github.com> References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> <6I2PyXMG5jSH3dmfnmUvUOrtZ9ntwjkZEw2GqFFCsNg=.de6024a5-5c60-4842-afe5-3d878b65bb6c@github.com> Message-ID: On Mon, 24 Feb 2025 22:34:01 GMT, Dean Long wrote: >> src/hotspot/share/runtime/deoptimization.cpp line 650: >> >>> 648: // would need to get the size from the resolved method entry. Another exception would >>> 649: // be an invokedynamic with an adapter that is really a MethodHandle linker. >>> 650: caller_was_method_handle = true; >> >> This flag also controls the code at 711 that controls the computation of caller_adjustment. Is the new answer also correct for that code? >> >> This code might be a bit clearer if the computations of caller_was_method_handle, caller_adjustment and the new caller_actual_parameters were all closer together, though that might complicate a backport so maybe it should be deferred to some later cleanup. > > Yes, I have further cleanup that I want to do later, but I want to minimize changes in this one to simplify backports. > Good catch about line 711. I left it in on purpose, again to simplify backports, but it could be safely removed. All it does here is over-estimate the adjustment, which is harmless. In future cleanups, I hope to make the adjustment exact rather than a possibly over-estimated increment. Sounds good. I kind of assumed it was a benign oversizing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1968699203 From never at openjdk.org Tue Feb 25 02:11:54 2025 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 25 Feb 2025 02:11:54 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v3] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Wed, 19 Feb 2025 00:37:14 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > Stricter assertion on ppc64 The new asserts look good and the logic seems right. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23557#pullrequestreview-2638933792 From dlong at openjdk.org Tue Feb 25 04:05:05 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 25 Feb 2025 04:05:05 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v3] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Wed, 19 Feb 2025 00:37:14 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > Stricter assertion on ppc64 Thanks, Tom, for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23557#issuecomment-2680380837 From qpzhang at openjdk.org Tue Feb 25 05:00:50 2025 From: qpzhang at openjdk.org (Patrick Zhang) Date: Tue, 25 Feb 2025 05:00:50 GMT Subject: RFR: 8350483: AArch64: turn on signum intrinsics by default on Ampere CPUs In-Reply-To: <7XQsAZxrIwrsL3gPazBVzWnfQmfH3R6Xwnadg-9Jd34=.34b8e435-1d9f-4486-948e-70079238e3fd@github.com> References: <7XQsAZxrIwrsL3gPazBVzWnfQmfH3R6Xwnadg-9Jd34=.34b8e435-1d9f-4486-948e-70079238e3fd@github.com> Message-ID: On Mon, 24 Feb 2025 12:52:14 GMT, Andrew Haley wrote: > > Maybe we should think about removing the `UseSignumIntrinsic` flag altogether. > > Ah, the flag is also used by other ports. But it doesn't make much sense for us not to use the intrinsic. Thanks for your review @theRealAph. Yes, agree with you that we should turn on signum intrinsics (`-XX:+UseSignumIntrinsic`), probably `-XX:+UseCopySignIntrinsic` too. I had a larger test set on Neoverse-N1 and Ampere CPUs, and compared w/ vs w/o these two flags respectively, no obvious performance regression so far. `-XX:+UseSignumIntrinsic` can produce obvious benefits on `fmov` cases, while `-XX:+UseCopySignIntrinsic` can also bring +15% improvements to `*MathBench.signum*` tests when disabling the signum intrinsics. In Math.java, signum calls copySign, as such the benefit of copysign intrinsics would be hidden in a manner. Therefore, I think they two could be turned on altogether. A JBS ticket is required to track so. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23735#issuecomment-2680514896 From duke at openjdk.org Tue Feb 25 05:00:51 2025 From: duke at openjdk.org (duke) Date: Tue, 25 Feb 2025 05:00:51 GMT Subject: RFR: 8350483: AArch64: turn on signum intrinsics by default on Ampere CPUs In-Reply-To: References: Message-ID: On Sat, 22 Feb 2025 15:27:41 GMT, Patrick Zhang wrote: > Set -XX:+UseSignumIntrinsic by default for Ampere CPUs. It is to fix performance problem found on JMH cases `vm.compiler.Signum|java.lang.*MathBench.sig[nN]um*` where fmov is used to transmit data between GPRs and FPRs with significant time cost. > > Verified on Ampere-1A and found the scores (thrpt, ops/s) of `java.lang.*MathBench.sig[nN]um*` improved 40~50%, while `vm.compiler.Signum._1_signumFloatTest` and `vm.compiler.Signum._3_signumDoubleTest` results gained exponential increases. Also passed GHA sanity checks, and Jtreg tier1 on Ampere-1A as function-wise smoke tests. @cnqpzhang Your change (at version 41a8917f38d1236cebfcbe9896cf1627cf29d29a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23735#issuecomment-2680515990 From asmehra at openjdk.org Tue Feb 25 05:48:56 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 25 Feb 2025 05:48:56 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v8] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 01:11:25 GMT, Ioi Lam wrote: >> Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. >> >> With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. >> >> To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): >> >>> the format of the configuration and cache files is not specified and is subject to change without notice. >> >> **Notes for reviewers:** >> >> - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. >> - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. >> - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. >> - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. >> - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. >> >> **Misc Note** >> - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccb... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - all tests in runtime/cds/appcds/aotClassLinking should be excluded for hotspot_appcds_dynamic testing > - @ashu-mehra comment - simplified _archived_cpp_vtptrs; also fixed old comments near by > - Merge branch 'master' into 8348426-binary-aot-config-file > - Merge branch 'master' into 8348426-binary-aot-config-file > - @ashu-mehra comments > - @calvinccheung comments > - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself > - Improve error message when AOTMode=create has an incompatible classpath > - Fixed test cases @vnkozlov > - Update "make test JTREG_AOT_JDK=true ..." to work with binary AOT configuration > - ... and 5 more: https://git.openjdk.org/jdk/compare/990d40e9...1ec67c11 src/hotspot/share/cds/cppVtables.cpp line 322: > 320: assert(MetaspaceShared::is_in_shared_metaspace(m), "must be"); > 321: return vtable_of(m) == _index[Method_Kind]->cloned_vtable() || > 322: vtable_of(m) == _archived_cpp_vtptrs[Method_Kind]; I am not sure if this needs any fixing. If `m` is in the archive (as the above assert says), then its vtable should always be `_index[Method_Kind]->cloned_vtable()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1968979794 From asmehra at openjdk.org Tue Feb 25 05:59:56 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 25 Feb 2025 05:59:56 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v8] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 01:11:25 GMT, Ioi Lam wrote: >> Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. >> >> With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. >> >> To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): >> >>> the format of the configuration and cache files is not specified and is subject to change without notice. >> >> **Notes for reviewers:** >> >> - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. >> - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. >> - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. >> - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. >> - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. >> >> **Misc Note** >> - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccb... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - all tests in runtime/cds/appcds/aotClassLinking should be excluded for hotspot_appcds_dynamic testing > - @ashu-mehra comment - simplified _archived_cpp_vtptrs; also fixed old comments near by > - Merge branch 'master' into 8348426-binary-aot-config-file > - Merge branch 'master' into 8348426-binary-aot-config-file > - @ashu-mehra comments > - @calvinccheung comments > - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself > - Improve error message when AOTMode=create has an incompatible classpath > - Fixed test cases @vnkozlov > - Update "make test JTREG_AOT_JDK=true ..." to work with binary AOT configuration > - ... and 5 more: https://git.openjdk.org/jdk/compare/990d40e9...1ec67c11 src/hotspot/share/cds/cppVtables.cpp line 293: > 291: for (kind = 0; kind < _num_cloned_vtable_kinds; kind ++) { > 292: if (vtable_of((Metadata*)obj) == _orig_cpp_vtptrs[kind] || > 293: vtable_of((Metadata*)obj) == _archived_cpp_vtptrs[kind]) { I think we should check these conditions only in the mode where applicable. That would make it easier to understand the code in future. So my suggestion is to update it as: for (kind = 0; kind < _num_cloned_vtable_kinds; kind ++) { intptr_t* vtable_ptr; if (CDSConfig::is_dumping_final_static_archive()) { vtable_ptr = _archived_cpp_vtptrs[kind]; } else { vtable_ptr = _orig_cpp_vtptrs[kind]; } if (vtable_of((Metadata*)obj) == vtable_ptr) { break; } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1968992973 From iklam at openjdk.org Tue Feb 25 06:06:57 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 25 Feb 2025 06:06:57 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v8] In-Reply-To: References: Message-ID: <16Tcr_cKWF0RycjWIFZFvyf5jMqLksor4YaWEKHWc7c=.7d3d00f7-84f1-4464-8555-dcbaecaedc26@github.com> On Tue, 25 Feb 2025 05:56:59 GMT, Ashutosh Mehra wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - all tests in runtime/cds/appcds/aotClassLinking should be excluded for hotspot_appcds_dynamic testing >> - @ashu-mehra comment - simplified _archived_cpp_vtptrs; also fixed old comments near by >> - Merge branch 'master' into 8348426-binary-aot-config-file >> - Merge branch 'master' into 8348426-binary-aot-config-file >> - @ashu-mehra comments >> - @calvinccheung comments >> - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself >> - Improve error message when AOTMode=create has an incompatible classpath >> - Fixed test cases @vnkozlov >> - Update "make test JTREG_AOT_JDK=true ..." to work with binary AOT configuration >> - ... and 5 more: https://git.openjdk.org/jdk/compare/990d40e9...1ec67c11 > > src/hotspot/share/cds/cppVtables.cpp line 293: > >> 291: for (kind = 0; kind < _num_cloned_vtable_kinds; kind ++) { >> 292: if (vtable_of((Metadata*)obj) == _orig_cpp_vtptrs[kind] || >> 293: vtable_of((Metadata*)obj) == _archived_cpp_vtptrs[kind]) { > > I think we should check these conditions only in the mode where applicable. That would make it easier to understand the code in future. So my suggestion is to update it as: > > > for (kind = 0; kind < _num_cloned_vtable_kinds; kind ++) { > intptr_t* vtable_ptr; > if (CDSConfig::is_dumping_final_static_archive()) { > vtable_ptr = _archived_cpp_vtptrs[kind]; > } else { > vtable_ptr = _orig_cpp_vtptrs[kind]; > } > if (vtable_of((Metadata*)obj) == vtable_ptr) { > break; > } > } During the final archive dump, we can have both archived classes from the preimage as well as dynamically loaded classes. For example, hidden classes are generated when linking lambdas. > src/hotspot/share/cds/cppVtables.cpp line 322: > >> 320: assert(MetaspaceShared::is_in_shared_metaspace(m), "must be"); >> 321: return vtable_of(m) == _index[Method_Kind]->cloned_vtable() || >> 322: vtable_of(m) == _archived_cpp_vtptrs[Method_Kind]; > > I am not sure if this needs any fixing. If `m` is in the archive (as the above assert says), then its vtable should always be `_index[Method_Kind]->cloned_vtable()`. `_index` is updated to use the runtime vtables after `CppVtables::dumptime_init()`, so we must check `_archived_cpp_vtptrs` as well. This is a new condition that can happen when dumping the final archive. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1969000603 PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1969000378 From asmehra at openjdk.org Tue Feb 25 06:37:57 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 25 Feb 2025 06:37:57 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v4] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 13:14:34 GMT, Thomas Stuefe wrote: >> Greetings, >> >> This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. >> >> Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. >> >> I wanted to track that information correctly and display it clearly in a way that is easy to understand. >> >> The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). >> >> The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. >> >> The statistic gives us two new forms of output: >> >> 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: >> >> >> Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: >> Phase Total ra node comp type index reglive regsplit cienv other >> none 1205512 155104 982984 33712 0 0 0 0 0 33712 >> parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 >> optimizer 916584 0 556416 0 0 0 0 0 0 360168 >> escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 >> connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 >> macroEliminate 196448 0 196448 0 0 0 0 0 0 0 >> iterGVN 327440 0 196368 131072 0 0 0 0 0 0 >> incrementalInline 3992816 0 3043704 62... > > Thomas Stuefe has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > avoid Thread::current in high traffic chunk alloc path src/hotspot/share/compiler/compilationMemStatInternals.hpp line 92: > 90: > 91: // A very simple fixed-width FIFO buffer, used for the phase timeline > 92: template Would `size` be a better name than `max`? src/hotspot/share/compiler/compilationMemStatInternals.hpp line 160: > 158: void init(T v) { start = cur = peak = v; } > 159: void update(T v) { cur = v; if (v > peak) peak = v; } > 160: dT end_delta() const { return (dT)cur - (dT)start; } Should it be `return (dT)(cur - start); }` src/hotspot/share/compiler/compilationMemStatInternals.hpp line 161: > 159: void update(T v) { cur = v; if (v > peak) peak = v; } > 160: dT end_delta() const { return (dT)cur - (dT)start; } > 161: size_t temporary_peak_size() const { return MIN2(peak - cur, peak - start); } shouldn't it be `MAX2(peak - cur, peak - start)`? Why not just `peak - start`? src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 149: > 147: int col = start_indent; > 148: check_phase_trace_id(e.info.id); > 149: if (omit_empty_phases && e._bytes.end_delta() == 0 && e._bytes.temporary_peak_size() == 0) { `omit_empty_phases` is always false. Can it be just removed? src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 205: > 203: // seed current entry > 204: Entry& e = _fifo.current(); > 205: e._bytes.start = e._bytes.cur = e._bytes.peak = cur_abs; This can be replaced by `e._bytes.init(cur_abs)`. Same for the next statement. On same lines I would suggest to add `Entry::init()` and call it here. src/hotspot/share/memory/arena.hpp line 48: > 46: const size_t _len; // Size of this Chunk > 47: // Used for Compilation Memory Statistic > 48: uint64_t _stamp; This is wasted space if compilation memory stats is not enabled. One way to avoid this is to subclass `Chunk` as a `StampedChunk` and use that if compilation memory stats is enabled. Is this complexity worth the space saving? src/hotspot/share/opto/phase.hpp line 125: > 123: f( _t_temporaryTimer1, "tempTimer1") \ > 124: f( _t_temporaryTimer2, "tempTimer2") \ > 125: f( _t_testTimer1, "testTimer1") \ Would `_t_testPhase1` and `_t_testPhase2` be a better name? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968993551 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968993720 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968993803 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968994072 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968994121 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1968999519 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1969041777 From epeter at openjdk.org Tue Feb 25 07:11:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Feb 2025 07:11:55 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: <9mXRl7rScxJwxNNlV_H1gxndtzZ6g-gE8cMsc6VsTJQ=.b5a77c13-6e7e-4203-898a-3318e298d30f@github.com> References: <9mXRl7rScxJwxNNlV_H1gxndtzZ6g-gE8cMsc6VsTJQ=.b5a77c13-6e7e-4203-898a-3318e298d30f@github.com> Message-ID: On Tue, 25 Feb 2025 00:34:14 GMT, Vladimir Kozlov wrote: > > But if we do not optimize the slow path loop, then we would get performance regressions in aliasing cases because we have no unrolling for them any more. > > Okay, we are back to our previous conversation - we will wait your aliasing-analysis runtime-checks implementation and do performance runs to see if "slow" path affects performance. > > Okay. Sounds good, we will revisit and write more benchmarks there. > > PS: "slow" path implies that it is not taking frequently and it should not affect general performance of application. For me "slow" just means less optimized, because some assumption does not hold. The "fast" path is faster, because it has more assumptions and can optimize more (i.e. vectorize in this case, or vectorize more instructions). Do you have a better name than "fast/slow"? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2680885496 From epeter at openjdk.org Tue Feb 25 07:15:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Feb 2025 07:15:56 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: <9mXRl7rScxJwxNNlV_H1gxndtzZ6g-gE8cMsc6VsTJQ=.b5a77c13-6e7e-4203-898a-3318e298d30f@github.com> References: <9mXRl7rScxJwxNNlV_H1gxndtzZ6g-gE8cMsc6VsTJQ=.b5a77c13-6e7e-4203-898a-3318e298d30f@github.com> Message-ID: On Tue, 25 Feb 2025 00:34:14 GMT, Vladimir Kozlov wrote: >> @vnkozlov I mean the issue this: once I implement aliasing-analysis runtime-checks with this multiversion approach, then we'd get regressions if we do not optimize the slow path loop. Currently, we would not vectorize (because we have to be ready for aliasing cases), but we at least unroll, and whatever else we can except vectorization. But if we do not optimize the slow path loop, then we would get performance regressions in aliasing cases because we have no unrolling for them any more. I think we need to avoid that - would you agree? > >> But if we do not optimize the slow path loop, then we would get performance regressions in aliasing cases because we have no unrolling for them any more. > > Okay, we are back to our previous conversation - we will wait your aliasing-analysis runtime-checks implementation and do performance runs to see if "slow" path affects performance. > > Okay. > > PS: "slow" path implies that it is not taking frequently and it should not affect general performance of application. @vnkozlov @rwestrel Let me summarize the tasks left to do here: - Rename `stalled` -> `delayed`. And `unstall` -> `resume_optimizations` or alike. Improve some comments. - File follow-up RFE for more verification (must find multiversion-if from multiversioned loop) - currently blocked by predicate traversal issue. Maybe we can also assert that we can always find the pre-loop from the main-loop, at least during loop-opts. - When working on aliasing-analysis runtime-check, we have to do more performance analysis, and show the need of both the fast and slow path loops. Let me know if there is more ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2680894298 From asmehra at openjdk.org Tue Feb 25 08:28:59 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 25 Feb 2025 08:28:59 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v8] In-Reply-To: <16Tcr_cKWF0RycjWIFZFvyf5jMqLksor4YaWEKHWc7c=.7d3d00f7-84f1-4464-8555-dcbaecaedc26@github.com> References: <16Tcr_cKWF0RycjWIFZFvyf5jMqLksor4YaWEKHWc7c=.7d3d00f7-84f1-4464-8555-dcbaecaedc26@github.com> Message-ID: On Tue, 25 Feb 2025 06:04:15 GMT, Ioi Lam wrote: >> src/hotspot/share/cds/cppVtables.cpp line 293: >> >>> 291: for (kind = 0; kind < _num_cloned_vtable_kinds; kind ++) { >>> 292: if (vtable_of((Metadata*)obj) == _orig_cpp_vtptrs[kind] || >>> 293: vtable_of((Metadata*)obj) == _archived_cpp_vtptrs[kind]) { >> >> I think we should check these conditions only in the mode where applicable. That would make it easier to understand the code in future. So my suggestion is to update it as: >> >> >> for (kind = 0; kind < _num_cloned_vtable_kinds; kind ++) { >> intptr_t* vtable_ptr; >> if (CDSConfig::is_dumping_final_static_archive()) { >> vtable_ptr = _archived_cpp_vtptrs[kind]; >> } else { >> vtable_ptr = _orig_cpp_vtptrs[kind]; >> } >> if (vtable_of((Metadata*)obj) == vtable_ptr) { >> break; >> } >> } > > During the final archive dump, we can have both archived classes from the preimage as well as dynamically loaded classes. For example, hidden classes are generated when linking lambdas. Right, I forgot about the lambdas. Thanks for pointing it out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1969245039 From lucy at openjdk.org Tue Feb 25 08:43:53 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 25 Feb 2025 08:43:53 GMT Subject: RFR: 8350182: [s390x] Relativize locals in interpreter frames In-Reply-To: References: Message-ID: On Mon, 17 Feb 2025 09:53:37 GMT, Amit Kumar wrote: > Port for [JDK-8299795](https://bugs.openjdk.org/browse/JDK-8299795) Relativize Z_locals in interpreter frame for s390x. > > Tier1 test with fastdebug vm are clean. Looks good overall. One "hazy" performance concern. src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp line 1138: > 1136: // z_ijava_state->locals = Z_esp + parameter_count bytes > 1137: > 1138: __ z_ldgr(Z_F1, Z_R1); // preserve Z_R1, holding cache offset How expensive is this? I remember those GPR <-> FPR transfers to be convenient, but inefficient. But that my have changed. Would Z_R0 be available or is it occupied as well? ------------- PR Review: https://git.openjdk.org/jdk/pull/23660#pullrequestreview-2640023331 PR Review Comment: https://git.openjdk.org/jdk/pull/23660#discussion_r1969221075 From asmehra at openjdk.org Tue Feb 25 08:55:57 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 25 Feb 2025 08:55:57 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v8] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 01:11:25 GMT, Ioi Lam wrote: >> Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. >> >> With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. >> >> To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): >> >>> the format of the configuration and cache files is not specified and is subject to change without notice. >> >> **Notes for reviewers:** >> >> - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. >> - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. >> - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. >> - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. >> - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. >> >> **Misc Note** >> - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccb... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - all tests in runtime/cds/appcds/aotClassLinking should be excluded for hotspot_appcds_dynamic testing > - @ashu-mehra comment - simplified _archived_cpp_vtptrs; also fixed old comments near by > - Merge branch 'master' into 8348426-binary-aot-config-file > - Merge branch 'master' into 8348426-binary-aot-config-file > - @ashu-mehra comments > - @calvinccheung comments > - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself > - Improve error message when AOTMode=create has an incompatible classpath > - Fixed test cases @vnkozlov > - Update "make test JTREG_AOT_JDK=true ..." to work with binary AOT configuration > - ... and 5 more: https://git.openjdk.org/jdk/compare/990d40e9...1ec67c11 Marked as reviewed by asmehra (Committer). lgtm ------------- PR Review: https://git.openjdk.org/jdk/pull/23484#pullrequestreview-2640158319 PR Comment: https://git.openjdk.org/jdk/pull/23484#issuecomment-2681180532 From asmehra at openjdk.org Tue Feb 25 08:55:58 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 25 Feb 2025 08:55:58 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v8] In-Reply-To: <16Tcr_cKWF0RycjWIFZFvyf5jMqLksor4YaWEKHWc7c=.7d3d00f7-84f1-4464-8555-dcbaecaedc26@github.com> References: <16Tcr_cKWF0RycjWIFZFvyf5jMqLksor4YaWEKHWc7c=.7d3d00f7-84f1-4464-8555-dcbaecaedc26@github.com> Message-ID: On Tue, 25 Feb 2025 06:04:01 GMT, Ioi Lam wrote: > _index is updated to use the runtime vtables after CppVtables::dumptime_init(), so we must check _archived_cpp_vtptrs as well. That's right. I agree with your point, but I think the method `is_valid_shared_method` is not used during dump time. One of the caller is `Method::restore_unshareable_info` which is invoked during archive loading, and the other caller is related to JNI APIs to call Java methods. So the context of the call makes me think the current condition is sufficient. But again, it can be argued that this function can break if it gets used in some other context where the current condition is not sufficient. I think I am fine with this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23484#discussion_r1969293214 From epeter at openjdk.org Tue Feb 25 09:27:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Feb 2025 09:27:13 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v4] In-Reply-To: References: Message-ID: > Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below. > > **Background** > > With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer. > > **Problem** > > So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code. > > > MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1); > MemorySegment nativeUnaligned = nativeAligned.asSlice(1); > test3(nativeUnaligned); > > > When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not! > > static void test3(MemorySegment ms) { > for (int i = 0; i < RANGE; i++) { > long adr = i * 4L; > int v = ms.get(ELEMENT_LAYOUT, adr); > ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1)); > } > } > > > **Solution: Runtime Checks - Predicate and Multiversioning** > > Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check. > > I came up with 2 options where to place the runtime checks: > - A new "auto vectorization" Parse Predicate: > - This only works when predicates are available. > - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop. > - Multiversion the loop: > - Create 2 copies of the loop (fast and slow loops). > - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take > - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even unaligned `base`s would end up with reasonably fast code. > - We "stall" the `... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 66 commits: - Merge branch 'master' into JDK-8323582-SW-native-alignment - stall -> delay, plus some more comments - adjust selector if probability - Merge branch 'master' into JDK-8323582-SW-native-alignment - remove multiversion mark if we break the structure - register opaque with igvn - copyright and rm CFG check - IR rules for all cases - 3 test versions - test changed to unaligned ints - ... and 56 more: https://git.openjdk.org/jdk/compare/d551daca...8eb52292 ------------- Changes: https://git.openjdk.org/jdk/pull/22016/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22016&range=03 Stats: 1089 lines in 27 files changed: 966 ins; 28 del; 95 mod Patch: https://git.openjdk.org/jdk/pull/22016.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22016/head:pull/22016 PR: https://git.openjdk.org/jdk/pull/22016 From epeter at openjdk.org Tue Feb 25 09:36:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 25 Feb 2025 09:36:58 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: <9mXRl7rScxJwxNNlV_H1gxndtzZ6g-gE8cMsc6VsTJQ=.b5a77c13-6e7e-4203-898a-3318e298d30f@github.com> References: <9mXRl7rScxJwxNNlV_H1gxndtzZ6g-gE8cMsc6VsTJQ=.b5a77c13-6e7e-4203-898a-3318e298d30f@github.com> Message-ID: On Tue, 25 Feb 2025 00:34:14 GMT, Vladimir Kozlov wrote: >> @vnkozlov I mean the issue this: once I implement aliasing-analysis runtime-checks with this multiversion approach, then we'd get regressions if we do not optimize the slow path loop. Currently, we would not vectorize (because we have to be ready for aliasing cases), but we at least unroll, and whatever else we can except vectorization. But if we do not optimize the slow path loop, then we would get performance regressions in aliasing cases because we have no unrolling for them any more. I think we need to avoid that - would you agree? > >> But if we do not optimize the slow path loop, then we would get performance regressions in aliasing cases because we have no unrolling for them any more. > > Okay, we are back to our previous conversation - we will wait your aliasing-analysis runtime-checks implementation and do performance runs to see if "slow" path affects performance. > > Okay. > > PS: "slow" path implies that it is not taking frequently and it should not affect general performance of application. @vnkozlov @rwestrel - I did the `stall` -> `delay` renaming, and added some more comments in places you asked for it. Let me know if that looks better. - Filed: [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637): C2: verify that main_loop finds pre_loop and that multiversion loops find the multiversion_if - I added a comment to [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751) C2 SuperWord: Aliasing Analysis runtime check, to check performance around slow_loop. Let me know what more I can do ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2681315131 From aph at openjdk.org Tue Feb 25 09:40:57 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 25 Feb 2025 09:40:57 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: <1yB95sOajuS5ptFI0GQWLepii5JsZ9DOsje-TEFyFYs=.a325ad18-17ed-4e77-b1e3-0bad2cf55c67@github.com> Message-ID: On Mon, 24 Feb 2025 17:11:24 GMT, Andrew Dinn wrote: >> I have tried that, but the python script (actually the as command that it started) threw error messages: >> >> aarch64ops.s:338:24: error: index must be a multiple of 8 in range [0, 32760]. >> prfm PLDL1KEEP, [x15, 43] >> ^ >> aarch64ops.s:357:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] >> sub x1, x10, x23, sxth #2 >> ^ >> aarch64ops.s:359:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] >> add x11, x21, x5, uxtb #3 >> ^ >> aarch64ops.s:360:22: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] >> adds x11, x17, x17, uxtw #1 >> ^ >> aarch64ops.s:361:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] >> sub x11, x0, x15, uxtb #1 >> ^ >> aarch64ops.s:362:19: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] >> subs x7, x1, x0, sxth #2 >> ^ >> This is without any modifications from what is in the master branch currently. > > @ferakocz This also really needs addressing before committing the patch. Perhaps @theRealAph can advise on how to circumvent the problems you found when trying to update the python script? > You might have to use an assembler from the latest binutils build (if the system default isn't the latest) and add the path to the assembler in the "AS" variable. Also you can run it something like - `python aarch64-asmtest.py | expand > asmtest.out.h`. Please let me know if you still face problems. People have been running this script for a decade now. Let's look at just one of these: aarch64ops.s:357:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] sub x1, x10, x23, sxth #2 >From the AArch64 manual: SUB (extended register) SUB , , {, {#}} It thinks this is a SUB (shifted register), bit it's really a SUB (extended register). fedora:aarch64 $ cat t.s sub x1, x10, x23, sxth #2 fedora:aarch64 $ as t.s fedora:aarch64 $ objdump -D a.out Disassembly of section .text: 0000000000000000 <.text>: 0: cb37a941 sub x1, x10, w23, sxth #2 So perhaps binutils expects w23 here, not x23. But the manual (ARM DDI 0487K.a) says x23 should be just fine, and, what's more, gives the x form preferred status. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1969374124 From azafari at openjdk.org Tue Feb 25 09:57:26 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 25 Feb 2025 09:57:26 GMT Subject: RFR: 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag Message-ID: With the `size` parameter there will be no need to traverse/go through the nodes between the base and end of the region. Tests: linux-x64-debug, gtest:NMT* and runtime/NMT* ------------- Commit messages: - applied also to VMT. - 8350566: NMT: add size parameter to MemTracker::record_virtual_memory_tag Changes: https://git.openjdk.org/jdk/pull/23770/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23770&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350566 Stats: 23 lines in 14 files changed: 1 ins; 0 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/23770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23770/head:pull/23770 PR: https://git.openjdk.org/jdk/pull/23770 From stuefe at openjdk.org Tue Feb 25 10:03:09 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Feb 2025 10:03:09 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v4] In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 10:10:50 GMT, Roberto Casta?eda Lozano wrote: >> Thomas Stuefe has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> avoid Thread::current in high traffic chunk alloc path > > src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 255: > >> 253: char tmp[1024]; >> 254: _k->as_C_string(tmp, sizeof(tmp)); >> 255: if (UseNewCode){ printf("%s\n",tmp); fflush(stdout);} > > I guess this use of `UseNewCode` is not meant to be integrated? Yes, that was an error . I'll remove it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1969430129 From stuefe at openjdk.org Tue Feb 25 10:23:00 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Feb 2025 10:23:00 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v4] In-Reply-To: <_oNuzx4YepRchoguAnBbXw-31T14WgK8oQpC47FAJOc=.6edd8fcc-6757-448b-992d-b13b94af7c59@github.com> References: <_oNuzx4YepRchoguAnBbXw-31T14WgK8oQpC47FAJOc=.6edd8fcc-6757-448b-992d-b13b94af7c59@github.com> Message-ID: <_DyA72zJiCDj4xhjrKurD_l533AuXJk4fsLac4KR6Ww=.e8b40eac-da62-48ee-b2c5-c74a15d358c0@github.com> On Mon, 24 Feb 2025 10:19:42 GMT, Roberto Casta?eda Lozano wrote: >> Thomas Stuefe has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> avoid Thread::current in high traffic chunk alloc path > > src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 1093: > >> 1091: Compile::TracePhase tp(Phase::_t_testTimer1); >> 1092: Arena ar(MemTag::mtCompiler, Arena::Tag::tag_reglive); >> 1093: ar.Amalloc(2 * M); // phase-local peak, should show up at MY-TESTPHASE-2 > > The reference to `MY-TESTPHASE-2` seems obsolete. Removed > test/hotspot/jtreg/compiler/print/CompileCommandMemLimit.java line 105: > >> 103: // by phase end. So, in the phase timeline these 2MB must show up as "significant temporary peak". >> 104: // In testPhase2, we allocate 32MB from resource area, which is leaked until the end of the compilation. This >> 105: // means that these 32MB will show up as permanent memory increase in the phasetimeline. > > The references to `testPhase` seem obsolete, do you mean `Phase::_t_testTimer1` and `Phase::_t_testTimer2`? Right you are ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1969475279 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1969475077 From azafari at openjdk.org Tue Feb 25 11:06:26 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 25 Feb 2025 11:06:26 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v31] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: reviews applied. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20425/files - new: https://git.openjdk.org/jdk/pull/20425/files/5f4bc6dd..5aa4556a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=29-30 Stats: 48 lines in 5 files changed: 5 ins; 11 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From azafari at openjdk.org Tue Feb 25 11:06:26 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 25 Feb 2025 11:06:26 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v30] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: <8jsJSagIyDrTIC0DRnJtOjNaYQI3UaKEn4TC42lXqcU=.8e76aefd-5b48-48bd-982f-aceb9bd12d19@github.com> On Mon, 24 Feb 2025 12:45:51 GMT, Afshin Zafari wrote: >> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. >> - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. >> - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. >> - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > test file got back, fixed coding style Description is up to date now. Other related PRs are: https://github.com/openjdk/jdk/pull/23769 https://github.com/openjdk/jdk/pull/23770 https://github.com/openjdk/jdk/pull/23771 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2681581172 From azafari at openjdk.org Tue Feb 25 11:06:27 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 25 Feb 2025 11:06:27 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v29] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Mon, 24 Feb 2025 08:24:06 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> once more. > > src/hotspot/share/nmt/memoryFileTracker.cpp line 183: > >> 181: // Only account the committed memory. >> 182: snap->commit_memory(current->committed()); >> 183: });} > > Style: Restore to what it was before. Done. > src/hotspot/share/nmt/virtualMemoryTracker.hpp line 404: > >> 402: friend class VirtualMemoryTrackerTest; >> 403: friend class CommittedVirtualMemoryTest; >> 404: > > These two classes doesn't exist anymore. The first got back to life. The second is removed. > src/hotspot/share/nmt/vmatree.cpp line 80: > >> 78: MemTag tag = leqA_n->val().out.mem_tag(); >> 79: stA.out.set_tag(tag); >> 80: LEQ_A.state.out.set_tag(tag); > > This also seems like a bug fix that must be separated out into a separate PR along with test cases. Moved to this PR: https://github.com/openjdk/jdk/pull/23771 > src/hotspot/share/nmt/vmatree.cpp line 211: > >> 209: // Finally, we can register the new region [A, B)'s summary data. >> 210: MemTag tag_to_change = use_tag_inplace ? stA.out.mem_tag() : metadata.mem_tag; >> 211: SingleDiff& rescom = diff.tag[NMTUtil::tag_to_index(tag_to_change)]; > > This seems to be a bug fix to 8335091. You should open a separate mainline PR with this fix and add a testcase for it. Your fix should be integrated before this PR is. Moved to this PR: https://github.com/openjdk/jdk/pull/23771 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1969542371 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1969543273 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1969538538 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1969539202 From azafari at openjdk.org Tue Feb 25 11:06:27 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 25 Feb 2025 11:06:27 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v30] In-Reply-To: <2fTUz_oDUK6cNc-DY5AATuXDaqjQZ6XUa-t1VF_UzaI=.6fb21f2f-eccc-4950-9a31-98f467be48e6@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <2fTUz_oDUK6cNc-DY5AATuXDaqjQZ6XUa-t1VF_UzaI=.6fb21f2f-eccc-4950-9a31-98f467be48e6@github.com> Message-ID: <4vEqzxQM4EWODHCEEEU4HsEB9n11OY0B9askJZwUCaQ=.03a1652b-c793-4a57-9c44-3f55ae1f2a41@github.com> On Mon, 24 Feb 2025 20:01:18 GMT, Gerard Ziemski wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> test file got back, fixed coding style > > src/hotspot/share/nmt/virtualMemoryTracker.hpp line 381: > >> 379: bool remove_uncommitted_region (address base_addr, size_t size); >> 380: bool remove_released_region (address base_addr, size_t size); >> 381: bool remove_released_region (ReservedMemoryRegion* rgn); > > Why are they returning `bool` ? I don't see the return value being used anywhere? Good catch. They are removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1969540784 From azafari at openjdk.org Tue Feb 25 11:06:27 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 25 Feb 2025 11:06:27 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v28] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Mon, 24 Feb 2025 08:09:46 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> removed remaining of the unrelated changes. > > test/hotspot/gtest/runtime/test_virtualMemoryTracker.cpp line 1: > >> (failed to retrieve contents of file, check the PR for context) > Why are these tests removed? Can they be adapted to the new implementation? They are now implemented. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1969543987 From ayang at openjdk.org Tue Feb 25 11:17:06 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 25 Feb 2025 11:17:06 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v4] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 18 Feb 2025 09:20:57 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2681608369 From ayang at openjdk.org Tue Feb 25 11:17:07 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 25 Feb 2025 11:17:07 GMT Subject: Integrated: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME In-Reply-To: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Thu, 30 Jan 2025 12:12:29 GMT, Albert Mingkun Yang wrote: > Here is an attempt to simplify GCLocker implementation for Serial and Parallel. > > GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. > > The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. > > Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. > > Test: tier1-8 This pull request has now been integrated. Changeset: a9c9f7f0 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/a9c9f7f0cbb2f2395fef08348bf867ffa8875d73 Stats: 985 lines in 41 files changed: 50 ins; 822 del; 113 mod 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME Reviewed-by: tschatzl, iwalulya, egahlin ------------- PR: https://git.openjdk.org/jdk/pull/23367 From duke at openjdk.org Tue Feb 25 11:17:58 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Tue, 25 Feb 2025 11:17:58 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: <1yB95sOajuS5ptFI0GQWLepii5JsZ9DOsje-TEFyFYs=.a325ad18-17ed-4e77-b1e3-0bad2cf55c67@github.com> Message-ID: On Tue, 25 Feb 2025 09:36:49 GMT, Andrew Haley wrote: >> @ferakocz This also really needs addressing before committing the patch. Perhaps @theRealAph can advise on how to circumvent the problems you found when trying to update the python script? > >> You might have to use an assembler from the latest binutils build (if the system default isn't the latest) and add the path to the assembler in the "AS" variable. Also you can run it something like - `python aarch64-asmtest.py | expand > asmtest.out.h`. Please let me know if you still face problems. > > People have been running this script for a decade now. > > Let's look at just one of these: > > > aarch64ops.s:357:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] > sub x1, x10, x23, sxth #2 > > > From the AArch64 manual: > > SUB (extended register) > SUB , , {, {#}} > > It thinks this is a SUB (shifted register), bit it's really a SUB (extended register). > > > fedora:aarch64 $ cat t.s > sub x1, x10, x23, sxth #2 > fedora:aarch64 $ as t.s > fedora:aarch64 $ objdump -D a.out > Disassembly of section .text: > > 0000000000000000 <.text>: > 0: cb37a941 sub x1, x10, w23, sxth #2 > > > So perhaps binutils expects w23 here, not x23. But the manual (ARM DDI 0487K.a) says x23 should be just fine, and, what's more, gives the x form preferred status. @theRealAlph, maybe we are not reading the same manual (ARM DDI 0487K.a). In my copy: SUB (extended register) is defined as SUB , , {, {#}} and should be W when is SXTH and the as I have enforces this: ferakocz at ferakocz-mac aarch64 % cat t.s sub x1, x10, w23, sxth #2 ferakocz at ferakocz-mac aarch64 % cat > t1.s sub x1, x10, x23, sxth #2 ferakocz at ferakocz-mac aarch64 % cat t.s sub x1, x10, w23, sxth #2 ferakocz at ferakocz-mac aarch64 % cat t1.s sub x1, x10, x23, sxth #2 ferakocz at ferakocz-mac aarch64 % as --version Apple clang version 16.0.0 (clang-1600.0.26.6) Target: arm64-apple-darwin24.3.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin ferakocz at ferakocz-mac aarch64 % as t.s ferakocz at ferakocz-mac aarch64 % objdump -D t.o t.o: file format mach-o arm64 Disassembly of section __TEXT,__text: 0000000000000000 : 0: cb37a941 sub x1, x10, w23, sxth #2 ferakocz at ferakocz-mac aarch64 % as t1.s t1.s:1:19: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] sub x1, x10, x23, sxth #2 ^ I have not found the place in the manual where it allows/encourages the use of x instead of w, but I admit I haven't read through all of the 14568 pages. So I'm stuck for now. What 'as' are you using? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1969561791 From fgao at openjdk.org Tue Feb 25 11:26:33 2025 From: fgao at openjdk.org (Fei Gao) Date: Tue, 25 Feb 2025 11:26:33 GMT Subject: RFR: 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands [v3] In-Reply-To: References: Message-ID: > `IndOffXX` types don't do us any good. It would be simpler and faster to match a general-purpose `IndOff` type then let `legitimize_address()` fix any out-of-range operands. That'd reduce the size of the match rules and the time to run them. > > This patch simplifies the definitions of `immXOffset` with an estimated range. Whether an immediate can be encoded in a `LDR`/`STR` instructions as an offset will be determined in the phase of code-emitting. Meanwhile, we add necessary `legitimize_address()` in the phase of matcher for all `LDR`/`STR` instructions using the new `IndOff` memory operands (fix [JDK-8341437](https://bugs.openjdk.org/browse/JDK-8341437)). > > After this clean-up, memory operands matched with `IndOff` may require extra code emission (effectively a `lea`) before the address can be used. So we also modify the code about looking up precise offset of load/store instruction for implicit null check (fix [JDK-8340646](https://bugs.openjdk.org/browse/JDK-8340646)). On `aarch64` platform, we will use the beginning offset of the last instruction in the instruction clause emitted for a load/store machine node. Because `LDR`/`STR` is always the last one emitted, no matter what addressing mode the load/store operations finally use. > > Tier 1 - 3 passed on `Macos-aarch64` with or without the vm option `-XX:+UseZGC`. Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' into cleanup_indoff - Update the copyright year and code comments - Merge branch 'master' into cleanup_indoff - 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands IndOffXX types don't do us any good. It would be simpler and faster to match a general-purpose IndOff type then let legitimize_address() fix any out-of-range operands. That'd reduce the size of the match rules and the time to run them. This patch simplifies the definitions of `immXOffset` with an estimated range. Whether an immediate can be encoded in a LDR/STR instructions as an offset will be determined in the phase of code-emitting. Meanwhile, we add necessary `legitimize_address()` in the phase of matcher for all LDR/STR instructions using the new `IndOff` memory operands (fix JDK-8341437). After this clean-up, memory operands matched with `IndOff` may require extra code emission (effectively a lea) before the address can be used. So we also modify the code about looking up precise offset of load/store instruction for implicit null check (fix JDK-8340646). On aarch64 platform, we will use the beginning offset of the last instruction in the instruction clause emitted for a load/store machine node. Because LDR/STR is always the last one emitted, no matter what addressing mode the load/store operations finally use. Tier 1 - 3 passed on Macos-aarch64 with or without the vm option "-XX:+UseZGC" ------------- Changes: https://git.openjdk.org/jdk/pull/22862/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22862&range=02 Stats: 8753 lines in 15 files changed: 8373 ins; 247 del; 133 mod Patch: https://git.openjdk.org/jdk/pull/22862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22862/head:pull/22862 PR: https://git.openjdk.org/jdk/pull/22862 From coleenp at openjdk.org Tue Feb 25 12:40:03 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Feb 2025 12:40:03 GMT Subject: RFR: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native [v8] In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 19:30:41 GMT, Coleen Phillimore wrote: >> Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. >> Tested with tier1-4 and performance tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add a comment about Class constructor. Thanks for reviewing Dean, Roger, Vladimir, Yudi and Chen, and comments David. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23572#issuecomment-2681823548 From coleenp at openjdk.org Tue Feb 25 12:40:04 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Feb 2025 12:40:04 GMT Subject: Integrated: 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 20:56:39 GMT, Coleen Phillimore wrote: > Class.isInterface() can check modifier flags, Class.isArray() can check whether component mirror is non-null and Class.isPrimitive() needs a new final transient boolean in java.lang.Class that the JVM code initializes. > Tested with tier1-4 and performance tests. This pull request has now been integrated. Changeset: c413549e Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/c413549eb775f4209416c718dc9aa0748144a6b4 Stats: 202 lines in 20 files changed: 43 ins; 128 del; 31 mod 8349860: Make Class.isArray(), Class.isInterface() and Class.isPrimitive() non-native Reviewed-by: dlong, rriggs, vlivanov, yzheng, liach ------------- PR: https://git.openjdk.org/jdk/pull/23572 From cnorrbin at openjdk.org Tue Feb 25 12:50:26 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 25 Feb 2025 12:50:26 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v12] In-Reply-To: References: Message-ID: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: separate intrusivenode and normal node classes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23416/files - new: https://git.openjdk.org/jdk/pull/23416/files/65892c4e..0ea10c19 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=10-11 Stats: 331 lines in 3 files changed: 89 ins; 65 del; 177 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From coleenp at openjdk.org Tue Feb 25 12:51:54 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Feb 2025 12:51:54 GMT Subject: RFR: 8328473: StringTable and SymbolTable statistics delay time to safepoint [v2] In-Reply-To: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> References: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> Message-ID: On Mon, 24 Feb 2025 18:41:28 GMT, Coleen Phillimore wrote: >> This change adds a safepoint poll to gathering statistics for the Symbol and String tables, using the ConcurrentHashTableTasks to chunk up the walk. The stringTable and symbolTable is similar, like the GrowTask and DeleteTask code. Maybe this can be cleaned up but I don't have a good idea about that yet that doesn't involve yet another level of templated functions and code. This is already pretty highly templatized. >> Tested with tier1-4 and runThese internal test with JFR and failure injection to verify that we do try to safepoint while gathering statistics. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fxi typo. We have this internal tests that run all the JCKs for 30 minutes with JFR and JVMTI on (runThese) so it safepoints a lot and JFR gathers these statistics every 10 seconds. Running with -Xlog:safepoint and gathering out the "Reaching safepoint: " value from the log lines, eg: [91.266s][info][safepoint ] Safepoint "ThreadDump", Time since last: 6550046 ns, Reaching safepoint: 199739 ns, At safepoint: 342759 ns, Total: 542498 ns And ran this python script: import re total = count = 0 lines = open('out', 'r').read().splitlines() for line in lines: try: digit = lambda x: int(x) res = digit(line) total += res count += 1 except ValueError: pass if count > 0: print ("Total evaluated numbers: " + str(count)) print ("Total: " + str(total)) print ("Average: " + str(total/count)) Before my change the results are: Total evaluated numbers: 7980 Total: 51916917448 Average: 6505879.3794486215 After this change: Total evaluated numbers: 8052 Total: 36508335385 Average: 4534070.465101838 I probably could write a more directed test to see if it's really the symbol/string table change, but I thought this was interesting. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23750#issuecomment-2681875152 From cnorrbin at openjdk.org Tue Feb 25 13:04:11 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 25 Feb 2025 13:04:11 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v13] In-Reply-To: References: Message-ID: > Hi everyone, > > The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. > > Two key changes enable this feature: > 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. > 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. > > > Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. > > An example of how you could use the intrusive tree is found below: > > ```c++ > struct MyIntrusiveStructure { > Node node; // The tree node is part of an external structure > int data; > > MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} > Node* get_node() { return &node; } > static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } > }; > > Tree my_intrusive_tree; > > Cursor insert_cursor = my_intrusive_tree.cursor_find(0); > Node insert_node = Node(0); > > // Custom allocation here is just malloc > MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); > new (place) MyIntrusiveStructure(0, insert_node); > > my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); > > Cursor find_cursor = my_intrusive_tree.cursor_find(0); > int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; > > > > Please let me know any feedback or concerns! Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: insert node intrusive fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23416/files - new: https://git.openjdk.org/jdk/pull/23416/files/0ea10c19..9f471485 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23416&range=11-12 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23416/head:pull/23416 PR: https://git.openjdk.org/jdk/pull/23416 From amitkumar at openjdk.org Tue Feb 25 13:18:38 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 25 Feb 2025 13:18:38 GMT Subject: RFR: 8350182: [s390x] Relativize locals in interpreter frames [v2] In-Reply-To: References: Message-ID: > Port for [JDK-8299795](https://bugs.openjdk.org/browse/JDK-8299795) Relativize Z_locals in interpreter frame for s390x. > > Tier1 test with fastdebug vm are clean. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: use Z_R0 as helper ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23660/files - new: https://git.openjdk.org/jdk/pull/23660/files/018d5bc0..46d6ae1c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23660&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23660&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23660.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23660/head:pull/23660 PR: https://git.openjdk.org/jdk/pull/23660 From amitkumar at openjdk.org Tue Feb 25 13:18:39 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 25 Feb 2025 13:18:39 GMT Subject: RFR: 8350182: [s390x] Relativize locals in interpreter frames [v2] In-Reply-To: References: Message-ID: <6XYLt0n3FjHEgdc_-aHHlFAUEwvOUKgIPAU79pQfQ-c=.de4bd6da-f671-493a-9f65-8966880c8009@github.com> On Tue, 25 Feb 2025 08:14:29 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> use Z_R0 as helper > > src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp line 1138: > >> 1136: // z_ijava_state->locals = Z_esp + parameter_count bytes >> 1137: >> 1138: __ z_ldgr(Z_F1, Z_R1); // preserve Z_R1, holding cache offset > > How expensive is this? > I remember those GPR <-> FPR transfers to be convenient, but inefficient. But that my have changed. > Would Z_R0 be available or is it occupied as well? updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23660#discussion_r1969760795 From coleenp at openjdk.org Tue Feb 25 13:19:11 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Feb 2025 13:19:11 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 16:29:25 GMT, Fredrik Bredberg wrote: > I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. > > This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. > > In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. > > The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. > > You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. > > The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. > > Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). > > Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. > > However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fact that c2 no longer has to check b... src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 332: > 330: volatile_nonstatic_field(ObjectMonitor, _owner, int64_t) \ > 331: volatile_nonstatic_field(ObjectMonitor, _recursions, intptr_t) \ > 332: volatile_nonstatic_field(ObjectMonitor, _EntryListTail, ObjectWaiter*) \ You may need to coordinate with @mur47x111 to see what graal does with this field. I suspect the graal code also checks both ctx and EntryList in the unlock fast path and now only needs to check _EntryList. In which case we don't need to export EntryListTail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1947058523 From cnorrbin at openjdk.org Tue Feb 25 13:19:01 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Tue, 25 Feb 2025 13:19:01 GMT Subject: RFR: 8349211: Add support for intrusive trees to the utilities red-black tree [v13] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 13:04:11 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The recently integrated red-black tree can be made more flexible by adding support of intrusive trees. In an intrusive tree, the user has full control over node allocation and placement instead of having the tree manage it internally. >> >> Two key changes enable this feature: >> 1. Nodes can now be created outside of the tree's internal allocation mechanism, enabling users to allocate and prepare nodes before inserting them into the tree. >> 2. Cursors have been added to simplify navigation and iteration over the tree. These cursors are when inserting and removing nodes in an intrusive tree, where the internal tree allocator is not used. Additionally, cursors enable iteration over the tree and provide a convenient way to access node values. >> >> >> Many of the auxiliary tree functions have been updated to use these new features, resulting in simplified and cleaned-up code. More tests have also been added to cover both new and existing functionality. >> >> An example of how you could use the intrusive tree is found below: >> >> ```c++ >> struct MyIntrusiveStructure { >> Node node; // The tree node is part of an external structure >> int data; >> >> MyIntrusiveStructure(int data, Node node) : node(node), data(data) {} >> Node* get_node() { return &node; } >> static MyIntrusiveStructure* cast_to_self(Node* node) { return (MyIntrusiveStructure*)node; } >> }; >> >> Tree my_intrusive_tree; >> >> Cursor insert_cursor = my_intrusive_tree.cursor_find(0); >> Node insert_node = Node(0); >> >> // Custom allocation here is just malloc >> MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); >> new (place) MyIntrusiveStructure(0, insert_node); >> >> my_intrusive_tree.insert_at_cursor(place->get_node(), insert_cursor); >> >> Cursor find_cursor = my_intrusive_tree.cursor_find(0); >> int found_data = MyIntrusiveStructure::cast_to_self(find_cursor.node())->data; >> >> >> >> Please let me know any feedback or concerns! > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > insert node intrusive fix Just pushed another change, this time separating the node class into `IntrusiveRBNode` and `RBNode`. `IntrusiveRBNode` stores the tree pointers and is used for the intrusive tree, and no longer needs to be templated. `RBNode` is used for the normal tree and also stores a key and a value. This means that the key is now stored outside the node for intrusive trees. To allow for this, I changed the compare function for intrusive trees to `cmp(K a, const IntrusiveRBNode* b)`. For the normal tree, both `cmp(K a, const RBNode* b)` and `cmp(K a, K b)` can be used. With all the changes so far, the original example would now look something like this: ```c++ struct MyIntrusiveStructure { IntrusiveRBNode node; // The tree node is part of an external structure int key; int data; MyIntrusiveStructure(int key, int data) : key(key), data(data) {} IntrusiveRBNode* get_node() { return &node; } static MyIntrusiveStructure* cast_to_self(IntrusiveRBNode* node) { return (MyIntrusiveStructure*)node; } }; struct IntrusiveCmp { static int cmp(int a, const IntrusiveTreeNode* b) { return a - IntrusiveHolder::cast_to_self(b)->key; } }; IntrusiveRBTree my_intrusive_tree; const int key = 0; // Custom allocation here is just malloc MyIntrusiveStructure* place = (MyIntrusiveStructure*)os::malloc(sizeof(MyIntrusiveStructure), mtTest); new (place) MyIntrusiveStructure(key, 123); my_intrusive_tree.insert(key, place->get_node()); IntrusiveRBNode* found_node = my_intrusive_tree.find_node(key); int found_data = MyIntrusiveStructure::cast_to_self(found_node)->data; The key changes are: - The key is stored in `MyIntrusiveStructure` and can be modified/changed whenever. - No node constructor is needed and you simply need to pass the node pointer to `insert`. - No cursors are needed (although you could still use them) to insert/lookup/delete nodes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23416#issuecomment-2681950920 From aph at openjdk.org Tue Feb 25 13:19:02 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 25 Feb 2025 13:19:02 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: <1yB95sOajuS5ptFI0GQWLepii5JsZ9DOsje-TEFyFYs=.a325ad18-17ed-4e77-b1e3-0bad2cf55c67@github.com> Message-ID: On Tue, 25 Feb 2025 11:15:39 GMT, Ferenc Rakoczi wrote: >>> You might have to use an assembler from the latest binutils build (if the system default isn't the latest) and add the path to the assembler in the "AS" variable. Also you can run it something like - `python aarch64-asmtest.py | expand > asmtest.out.h`. Please let me know if you still face problems. >> >> People have been running this script for a decade now. >> >> Let's look at just one of these: >> >> >> aarch64ops.s:357:20: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] >> sub x1, x10, x23, sxth #2 >> >> >> From the AArch64 manual: >> >> SUB (extended register) >> SUB , , {, {#}} >> >> It thinks this is a SUB (shifted register), bit it's really a SUB (extended register). >> >> >> fedora:aarch64 $ cat t.s >> sub x1, x10, x23, sxth #2 >> fedora:aarch64 $ as t.s >> fedora:aarch64 $ objdump -D a.out >> Disassembly of section .text: >> >> 0000000000000000 <.text>: >> 0: cb37a941 sub x1, x10, w23, sxth #2 >> >> >> So perhaps binutils expects w23 here, not x23. But the manual (ARM DDI 0487K.a) says x23 should be just fine, and, what's more, gives the x form preferred status. > > @theRealAlph, maybe we are not reading the same manual (ARM DDI 0487K.a). In my copy: > SUB (extended register) is defined as > SUB , , {, {#}} > and should be W when is SXTH > and the as I have enforces this: > > ferakocz at ferakocz-mac aarch64 % cat t.s > sub x1, x10, w23, sxth #2 > ferakocz at ferakocz-mac aarch64 % cat > t1.s > sub x1, x10, x23, sxth #2 > ferakocz at ferakocz-mac aarch64 % cat t.s > sub x1, x10, w23, sxth #2 > ferakocz at ferakocz-mac aarch64 % cat t1.s > sub x1, x10, x23, sxth #2 > ferakocz at ferakocz-mac aarch64 % as --version > Apple clang version 16.0.0 (clang-1600.0.26.6) > Target: arm64-apple-darwin24.3.0 > Thread model: posix > InstalledDir: /Library/Developer/CommandLineTools/usr/bin > ferakocz at ferakocz-mac aarch64 % as t.s > ferakocz at ferakocz-mac aarch64 % objdump -D t.o > > t.o: file format mach-o arm64 > > Disassembly of section __TEXT,__text: > > 0000000000000000 : > 0: cb37a941 sub x1, x10, w23, sxth #2 > ferakocz at ferakocz-mac aarch64 % as t1.s > t1.s:1:19: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] > sub x1, x10, x23, sxth #2 > ^ > > I have not found the place in the manual where it allows/encourages the use of x instead of w, but I admit I haven't read through all of the 14568 pages. > > So I'm stuck for now. What 'as' are you using? > I have not found the place in the manual where it allows/encourages the use of x instead of w, but I admit I > haven't read through all of the 14568 pages. Yes, you've got a point, but it's always worked. Is this a macos thing, maybe? > So I'm stuck for now. What 'as' are you using? Latest binutils, today. I checked it out half an hour ago. GNU assembler (GNU Binutils) 2.44.50.20250225 Copyright (C) 2025 Free Software Foundation, Inc. Try this: diff --git a/test/hotspot/gtest/aarch64/aarch64-asmtest.py b/test/hotspot/gtest/aarch64/aarch64-asmtest.py index 9c770632e25..b1674fff04d 100644 --- a/test/hotspot/gtest/aarch64/aarch64-asmtest.py +++ b/test/hotspot/gtest/aarch64/aarch64-asmtest.py @@ -476,8 +476,13 @@ class AddSubExtendedOp(ThreeRegInstruction): + ", " + str(self.amount) + ");")) def astr(self): - return (super(AddSubExtendedOp, self).astr() - + (", " + AddSubExtendedOp.optNames[self.option] + prefix = self.asmRegPrefix + return (super(ThreeRegInstruction, self).astr() + + ('%s, %s, %s' + % (self.reg[0].astr(prefix), + self.reg[1].astr(prefix), + self.reg[1].astr("w")) + + ", " + AddSubExtendedOp.optNames[self.option] + " #" + str(self.amount))) class AddSubImmOp(TwoRegImmedInstruction): ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1969760509 From fbredberg at openjdk.org Tue Feb 25 13:19:11 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 25 Feb 2025 13:19:11 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 19:17:24 GMT, Coleen Phillimore wrote: >> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. >> >> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. >> >> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. >> >> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. >> >> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. >> >> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. >> >> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). >> >> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. >> >> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac... > > src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 332: > >> 330: volatile_nonstatic_field(ObjectMonitor, _owner, int64_t) \ >> 331: volatile_nonstatic_field(ObjectMonitor, _recursions, intptr_t) \ >> 332: volatile_nonstatic_field(ObjectMonitor, _EntryListTail, ObjectWaiter*) \ > > You may need to coordinate with @mur47x111 to see what graal does with this field. I suspect the graal code also checks both ctx and EntryList in the unlock fast path and now only needs to check _EntryList. In which case we don't need to export EntryListTail. Thanks for the heads up @coleenp . I was planing on contacting the Graal team when this PR gets closer to getting integrated. I'll delete the `_EntryListTail` export, and make sure to ask for a review from @mur47x111 when that time comes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1949002357 From aph at openjdk.org Tue Feb 25 13:19:03 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 25 Feb 2025 13:19:03 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: <1yB95sOajuS5ptFI0GQWLepii5JsZ9DOsje-TEFyFYs=.a325ad18-17ed-4e77-b1e3-0bad2cf55c67@github.com> Message-ID: On Tue, 25 Feb 2025 13:14:52 GMT, Andrew Haley wrote: >> @theRealAlph, maybe we are not reading the same manual (ARM DDI 0487K.a). In my copy: >> SUB (extended register) is defined as >> SUB , , {, {#}} >> and should be W when is SXTH >> and the as I have enforces this: >> >> ferakocz at ferakocz-mac aarch64 % cat t.s >> sub x1, x10, w23, sxth #2 >> ferakocz at ferakocz-mac aarch64 % cat > t1.s >> sub x1, x10, x23, sxth #2 >> ferakocz at ferakocz-mac aarch64 % cat t.s >> sub x1, x10, w23, sxth #2 >> ferakocz at ferakocz-mac aarch64 % cat t1.s >> sub x1, x10, x23, sxth #2 >> ferakocz at ferakocz-mac aarch64 % as --version >> Apple clang version 16.0.0 (clang-1600.0.26.6) >> Target: arm64-apple-darwin24.3.0 >> Thread model: posix >> InstalledDir: /Library/Developer/CommandLineTools/usr/bin >> ferakocz at ferakocz-mac aarch64 % as t.s >> ferakocz at ferakocz-mac aarch64 % objdump -D t.o >> >> t.o: file format mach-o arm64 >> >> Disassembly of section __TEXT,__text: >> >> 0000000000000000 : >> 0: cb37a941 sub x1, x10, w23, sxth #2 >> ferakocz at ferakocz-mac aarch64 % as t1.s >> t1.s:1:19: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] >> sub x1, x10, x23, sxth #2 >> ^ >> >> I have not found the place in the manual where it allows/encourages the use of x instead of w, but I admit I haven't read through all of the 14568 pages. >> >> So I'm stuck for now. What 'as' are you using? > >> I have not found the place in the manual where it allows/encourages the use of x instead of w, but I admit I > haven't read through all of the 14568 pages. > > Yes, you've got a point, but it's always worked. Is this a macos thing, maybe? > >> So I'm stuck for now. What 'as' are you using? > > Latest binutils, today. I checked it out half an hour ago. > > GNU assembler (GNU Binutils) 2.44.50.20250225 > Copyright (C) 2025 Free Software Foundation, Inc. > > Try this: > > > diff --git a/test/hotspot/gtest/aarch64/aarch64-asmtest.py b/test/hotspot/gtest/aarch64/aarch64-asmtest.py > index 9c770632e25..b1674fff04d 100644 > --- a/test/hotspot/gtest/aarch64/aarch64-asmtest.py > +++ b/test/hotspot/gtest/aarch64/aarch64-asmtest.py > @@ -476,8 +476,13 @@ class AddSubExtendedOp(ThreeRegInstruction): > + ", " + str(self.amount) + ");")) > > def astr(self): > - return (super(AddSubExtendedOp, self).astr() > - + (", " + AddSubExtendedOp.optNames[self.option] > + prefix = self.asmRegPrefix > + return (super(ThreeRegInstruction, self).astr() > + + ('%s, %s, %s' > + % (self.reg[0].astr(prefix), > + self.reg[1].astr(prefix), > + self.reg[1].astr("w")) > + + ", " + AddSubExtendedOp.optNames[self.option] > + " #" + str(self.amount))) > > class AddSubImmOp(TwoRegImmedInstruction): I just tried it with top-of trunk latest binutils: fedora:aarch64 $ ~/binutils-gdb-install/bin/as -march=armv9-a+sha3+sve2-bitperm aarch64ops.s fedora:aarch64 $ ~/binutils-gdb-install/bin/as --version GNU assembler (GNU Binutils) 2.44.50.20250225 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1969761898 From dholmes at openjdk.org Tue Feb 25 13:19:16 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 25 Feb 2025 13:19:16 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 16:29:25 GMT, Fredrik Bredberg wrote: > I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. > > This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. > > In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. > > The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. > > You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. > > The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. > > Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). > > Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. > > However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fact that c2 no longer has to check b... src/hotspot/share/runtime/objectMonitor.cpp line 704: > 702: > 703: for (;;) { > 704: ObjectWaiter* front = Atomic::load_acquire(&_entry_list); Technically you don't need a load_acquire here because you do not access any members of front before hitting the cmpxchg that gives you a full fence.. For good code hygiene Atomic::load would suffice. src/hotspot/share/runtime/objectMonitor.cpp line 723: > 721: > 722: for (;;) { > 723: ObjectWaiter* front = Atomic::load_acquire(&_entry_list); Technically you don't need a `load_acquire` here because you do not access any members of `front` before hitting the cmpxchg that gives you a full fence.. For good code hygiene `Atomic::load` would suffice. src/hotspot/share/runtime/objectMonitor.cpp line 1264: > 1262: return w; > 1263: } > 1264: w = Atomic::load_acquire(&_entry_list); Suggestion: // Need acquire here to match the implicit release of the cmpxchg that updated _entry_list, so we // can access w->_next. w = Atomic::load_acquire(&_entry_list); src/hotspot/share/runtime/objectMonitor.cpp line 1303: > 1301: // Check if we are unlinking the last element in the _entry_list. > 1302: // This is by far the most common case. > 1303: if (currentNode->_next == nullptr) { The direct checks of `_next` and _prev` for null/non-null do not work with your use of `set_bad_pointers`. If you actually intend to keep `set_bad_pointers` in the final code then you should be using accessors e.g. ObjectWaiter* next() { assert (_next != 0xBAD, "corrupted list!"); return _next; } src/hotspot/share/runtime/objectMonitor.cpp line 1306: > 1304: assert(_entry_list_tail == nullptr || _entry_list_tail == currentNode, "invariant"); > 1305: > 1306: ObjectWaiter* v = Atomic::load_acquire(&_entry_list); Again technically you do not need `load_acquire` here because you do not access any fields of `v` when `v` could be other than the current node. `Atomic::load` will suffice. src/hotspot/share/runtime/objectMonitor.cpp line 1315: > 1313: } > 1314: // The CAS above can fail from interference IFF a contending > 1315: // thread "pushed" itself onto entry_list. Suggestion: // The CAS above can fail from interference IFF a contending // thread "pushed" itself onto entry_list. So fall-through to // building the doubly-linked list. assert(currentNode->prev == nullptr, "invariant"); src/hotspot/share/runtime/objectMonitor.cpp line 1334: > 1332: } > 1333: > 1334: assert(currentNode->_next != nullptr, "invariant"); Suggestion: else { // currentNode->_next != nullptr // If we get here it means the current thread enqueued itself on the EntryList but was then able to // "steal" the lock before the chosen successor was able to. Consequently currentNode must be an // interior node in the EntryList, or the head. src/hotspot/share/runtime/objectMonitor.cpp line 1337: > 1335: assert(currentNode != _entry_list_tail, "invariant"); > 1336: > 1337: if (currentNode->_prev == nullptr) { Suggestion: // Check if we are in the singly-linked portion of the EntryList. If we are the head then we try to remove // ourselves, else we convert to the doubly-linked list. if (currentNode->_prev == nullptr) { src/hotspot/share/runtime/objectMonitor.cpp line 1347: > 1345: // else we convert to the doubly-linked list. > 1346: if (currentNode->_prev == nullptr) { > 1347: ObjectWaiter* v = Atomic::load_acquire(&_entry_list); Again no `load_acquire` needed. src/hotspot/share/runtime/objectMonitor.cpp line 1352: > 1350: // The CAS above can fail from interference IFF a contending > 1351: // thread "pushed" itself onto entry_list, in which case > 1352: // currentNode must now be in the interior of the list. Suggestion: // currentNode must now be in the interior of the list. Fall-through // to building the doubly-linked list. src/hotspot/share/runtime/objectMonitor.cpp line 1353: > 1351: // thread "pushed" itself onto entry_list, in which case > 1352: // currentNode must now be in the interior of the list. > 1353: assert(_entry_list != currentNode, "invariant"); Not sure you really need this. The fact the cmpxchg failed means we can't be the head of the list. Also by reading it again you are potentially finding a different head to that which existed when the cmpxchg failed. src/hotspot/share/runtime/objectMonitor.cpp line 1362: > 1360: } > 1361: > 1362: // We now assume we are unlinking currentNode from the interior of a Suggestion: // We now know we are unlinking currentNode from the interior of a src/hotspot/share/runtime/objectMonitor.cpp line 1534: > 1532: ObjectWaiter* w = nullptr; > 1533: > 1534: w = _entry_list; Use `Atomic::load` for consistency and good code hygiene. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1962360900 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1962359972 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1962364788 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1957707916 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1962368696 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1957692735 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1957696030 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1957698728 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1962370002 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1957699877 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1957701253 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1957701596 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1962372883 From yzheng at openjdk.org Tue Feb 25 13:19:11 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 25 Feb 2025 13:19:11 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 19:17:24 GMT, Coleen Phillimore wrote: >> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. >> >> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. >> >> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. >> >> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. >> >> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. >> >> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. >> >> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). >> >> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. >> >> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac... > > src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 332: > >> 330: volatile_nonstatic_field(ObjectMonitor, _owner, int64_t) \ >> 331: volatile_nonstatic_field(ObjectMonitor, _recursions, intptr_t) \ >> 332: volatile_nonstatic_field(ObjectMonitor, _EntryListTail, ObjectWaiter*) \ > > You may need to coordinate with @mur47x111 to see what graal does with this field. I suspect the graal code also checks both ctx and EntryList in the unlock fast path and now only needs to check _EntryList. In which case we don't need to export EntryListTail. Indeed. You may delete this export and I will make the Graal side changes accordingly at [MonitorSnippets.java#L680](https://github.com/oracle/graal/blob/3d543641b056fdaa8e7444f09615067f8d766f6e/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/hotspot/replacements/MonitorSnippets.java#L680) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1948809912 From fbredberg at openjdk.org Tue Feb 25 13:19:10 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 25 Feb 2025 13:19:10 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists Message-ID: I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fact that c2 no longer has to check both `EntryList` and `cxq` makes this PR worthwhile, I think. Tests tier1-7 passes okay as well as micro-benchmarks like `vm.lang.LockUnlock`. Unsupported platforms { ppc, riscv, s390 } has been tested with QEmu. ------------- Commit messages: - Moved set_bad_pointers() and added accessors. - Merge branch 'master' into 8343840_rewrite_objectmonitor_lists - Atomic hygiene - Fixed a bug in UnlinkAfterAcquire - General cleanup - Updated theory of operations comment - 8343840: Rewrite the ObjectMonitor lists Changes: https://git.openjdk.org/jdk/pull/23421/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23421&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343840 Stats: 594 lines in 9 files changed: 213 ins; 219 del; 162 mod Patch: https://git.openjdk.org/jdk/pull/23421.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23421/head:pull/23421 PR: https://git.openjdk.org/jdk/pull/23421 From fbredberg at openjdk.org Tue Feb 25 13:19:16 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 25 Feb 2025 13:19:16 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 20:55:28 GMT, David Holmes wrote: >> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. >> >> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. >> >> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. >> >> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. >> >> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. >> >> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. >> >> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). >> >> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. >> >> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac... > > src/hotspot/share/runtime/objectMonitor.cpp line 704: > >> 702: >> 703: for (;;) { >> 704: ObjectWaiter* front = Atomic::load_acquire(&_entry_list); > > Technically you don't need a load_acquire here because you do not access any members of front before hitting the cmpxchg that gives you a full fence.. For good code hygiene Atomic::load would suffice. Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 723: > >> 721: >> 722: for (;;) { >> 723: ObjectWaiter* front = Atomic::load_acquire(&_entry_list); > > Technically you don't need a `load_acquire` here because you do not access any members of `front` before hitting the cmpxchg that gives you a full fence.. For good code hygiene `Atomic::load` would suffice. Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 1264: > >> 1262: return w; >> 1263: } >> 1264: w = Atomic::load_acquire(&_entry_list); > > Suggestion: > > // Need acquire here to match the implicit release of the cmpxchg that updated _entry_list, so we > // can access w->_next. > w = Atomic::load_acquire(&_entry_list); Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 1303: > >> 1301: // Check if we are unlinking the last element in the _entry_list. >> 1302: // This is by far the most common case. >> 1303: if (currentNode->_next == nullptr) { > > The direct checks of `_next` and _prev` for null/non-null do not work with your use of `set_bad_pointers`. If you actually intend to keep `set_bad_pointers` in the final code then you should be using accessors e.g. > > ObjectWaiter* next() { > assert (_next != 0xBAD, "corrupted list!"); > return _next; > } Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 1306: > >> 1304: assert(_entry_list_tail == nullptr || _entry_list_tail == currentNode, "invariant"); >> 1305: >> 1306: ObjectWaiter* v = Atomic::load_acquire(&_entry_list); > > Again technically you do not need `load_acquire` here because you do not access any fields of `v` when `v` could be other than the current node. `Atomic::load` will suffice. Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 1315: > >> 1313: } >> 1314: // The CAS above can fail from interference IFF a contending >> 1315: // thread "pushed" itself onto entry_list. > > Suggestion: > > // The CAS above can fail from interference IFF a contending > // thread "pushed" itself onto entry_list. So fall-through to > // building the doubly-linked list. > assert(currentNode->prev == nullptr, "invariant"); Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 1334: > >> 1332: } >> 1333: >> 1334: assert(currentNode->_next != nullptr, "invariant"); > > Suggestion: > > else { // currentNode->_next != nullptr > > // If we get here it means the current thread enqueued itself on the EntryList but was then able to > // "steal" the lock before the chosen successor was able to. Consequently currentNode must be an > // interior node in the EntryList, or the head. Added the comment but left out the suggested "else" and kept the assert. I know that the if statement above always ends in a return, but if that is changed this feels safer. > src/hotspot/share/runtime/objectMonitor.cpp line 1337: > >> 1335: assert(currentNode != _entry_list_tail, "invariant"); >> 1336: >> 1337: if (currentNode->_prev == nullptr) { > > Suggestion: > > // Check if we are in the singly-linked portion of the EntryList. If we are the head then we try to remove > // ourselves, else we convert to the doubly-linked list. > if (currentNode->_prev == nullptr) { Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 1347: > >> 1345: // else we convert to the doubly-linked list. >> 1346: if (currentNode->_prev == nullptr) { >> 1347: ObjectWaiter* v = Atomic::load_acquire(&_entry_list); > > Again no `load_acquire` needed. Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 1352: > >> 1350: // The CAS above can fail from interference IFF a contending >> 1351: // thread "pushed" itself onto entry_list, in which case >> 1352: // currentNode must now be in the interior of the list. > > Suggestion: > > // currentNode must now be in the interior of the list. Fall-through > // to building the doubly-linked list. Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 1353: > >> 1351: // thread "pushed" itself onto entry_list, in which case >> 1352: // currentNode must now be in the interior of the list. >> 1353: assert(_entry_list != currentNode, "invariant"); > > Not sure you really need this. The fact the cmpxchg failed means we can't be the head of the list. Also by reading it again you are potentially finding a different head to that which existed when the cmpxchg failed. You are right I don't really need it, but sometimes I feel that comments can rotten, but asserts can't. I guess I put this one in so that it's easier to see what state the currentNode is in (not head) without reading through the logic that end up in the else-statement. > src/hotspot/share/runtime/objectMonitor.cpp line 1362: > >> 1360: } >> 1361: >> 1362: // We now assume we are unlinking currentNode from the interior of a > > Suggestion: > > // We now know we are unlinking currentNode from the interior of a Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 1534: > >> 1532: ObjectWaiter* w = nullptr; >> 1533: >> 1534: w = _entry_list; > > Use `Atomic::load` for consistency and good code hygiene. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963144747 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963135003 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963050591 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1967825628 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963136473 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963132242 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1961646077 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1961647021 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963137807 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1969341568 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1961659147 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963133824 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963141844 From aph at openjdk.org Tue Feb 25 13:40:57 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 25 Feb 2025 13:40:57 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 12:09:57 GMT, Bhavana Kilambi wrote: > This patch adds aarch64 backend for scalar FP16 operations namely - add, subtract, multiply, divide, fma, sqrt, min and max. test/hotspot/gtest/aarch64/aarch64-asmtest.py line 19: > 17: 0x7e0, 0xfc0, 0x1f80, 0x3ff0, 0x7e00, 0x8000, > 18: 0x81ff, 0xc1ff, 0xc003, 0xc7ff, 0xdfff, 0xe03f, > 19: 0xe1ff, 0xf801, 0xfc00, 0xfc07, 0xff03, 0xfffe] So here you've deleted the duplicated `0x7e00` (good) but also the not-duplicated `0xe10f`. Is `0xe10f` not valid? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23748#discussion_r1969800950 From aph at openjdk.org Tue Feb 25 13:46:57 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 25 Feb 2025 13:46:57 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 12:09:57 GMT, Bhavana Kilambi wrote: > This patch adds aarch64 backend for scalar FP16 operations namely - add, subtract, multiply, divide, fma, sqrt, min and max. Overall, this looks like a great pice of work. I only have a few changes in comments and a question, then we're good to go. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23748#issuecomment-2682030036 From aph at openjdk.org Tue Feb 25 13:52:55 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 25 Feb 2025 13:52:55 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: <1yB95sOajuS5ptFI0GQWLepii5JsZ9DOsje-TEFyFYs=.a325ad18-17ed-4e77-b1e3-0bad2cf55c67@github.com> Message-ID: On Tue, 25 Feb 2025 13:15:49 GMT, Andrew Haley wrote: >>> I have not found the place in the manual where it allows/encourages the use of x instead of w, but I admit I > haven't read through all of the 14568 pages. >> >> Yes, you've got a point, but it's always worked. Is this a macos thing, maybe? >> >>> So I'm stuck for now. What 'as' are you using? >> >> Latest binutils, today. I checked it out half an hour ago. >> >> GNU assembler (GNU Binutils) 2.44.50.20250225 >> Copyright (C) 2025 Free Software Foundation, Inc. >> >> Try this: >> >> >> diff --git a/test/hotspot/gtest/aarch64/aarch64-asmtest.py b/test/hotspot/gtest/aarch64/aarch64-asmtest.py >> index 9c770632e25..b1674fff04d 100644 >> --- a/test/hotspot/gtest/aarch64/aarch64-asmtest.py >> +++ b/test/hotspot/gtest/aarch64/aarch64-asmtest.py >> @@ -476,8 +476,13 @@ class AddSubExtendedOp(ThreeRegInstruction): >> + ", " + str(self.amount) + ");")) >> >> def astr(self): >> - return (super(AddSubExtendedOp, self).astr() >> - + (", " + AddSubExtendedOp.optNames[self.option] >> + prefix = self.asmRegPrefix >> + return (super(ThreeRegInstruction, self).astr() >> + + ('%s, %s, %s' >> + % (self.reg[0].astr(prefix), >> + self.reg[1].astr(prefix), >> + self.reg[1].astr("w")) >> + + ", " + AddSubExtendedOp.optNames[self.option] >> + " #" + str(self.amount))) >> >> class AddSubImmOp(TwoRegImmedInstruction): > > I just tried it with top-of trunk latest binutils: > > fedora:aarch64 $ ~/binutils-gdb-install/bin/as -march=armv9-a+sha3+sve2-bitperm aarch64ops.s > fedora:aarch64 $ ~/binutils-gdb-install/bin/as --version > GNU assembler (GNU Binutils) 2.44.50.20250225 Aha! aph at Andrews-MacBook-Pro ~ % as t.s t.s:1:19: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] sub x1, x10, x23, sxth #2 ^ aph at Andrews-MacBook-Pro ~ % as --version Apple clang version 16.0.0 (clang-1600.0.26.6) Target: arm64-apple-darwin24.3.0 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1969823700 From bkilambi at openjdk.org Tue Feb 25 13:55:58 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 25 Feb 2025 13:55:58 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 13:37:51 GMT, Andrew Haley wrote: >> This patch adds aarch64 backend for scalar FP16 operations namely - add, subtract, multiply, divide, fma, sqrt, min and max. > > test/hotspot/gtest/aarch64/aarch64-asmtest.py line 19: > >> 17: 0x7e0, 0xfc0, 0x1f80, 0x3ff0, 0x7e00, 0x8000, >> 18: 0x81ff, 0xc1ff, 0xc003, 0xc7ff, 0xdfff, 0xe03f, >> 19: 0xe1ff, 0xf801, 0xfc00, 0xfc07, 0xff03, 0xfffe] > > So here you've deleted the duplicated `0x7e00` (good) but also the not-duplicated `0xe10f`. Is `0xe10f` not valid? Hi, yes `0xe10f` does not seem to be valid. While I tried generating the `asmtest.out.h` I ran into errors with this value - aarch64ops.s:1105: Error: immediate out of range at operand 3 -- eor z6.h,z6.h,#0xe10f aarch64ops.s:1123: Error: immediate out of range at operand 3 -- eor z3.h,z3.h,#0xe10f So I looked it up here - https://gist.github.com/dinfuehr/51a01ac58c0b23e4de9aac313ed6a06a to see if this number is a legal immediate and looks like it isn't. Maybe it's just chance that this number wasn't generated before as an immediate operand and these errors didn't up till now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23748#discussion_r1969827032 From fgao at openjdk.org Tue Feb 25 14:14:58 2025 From: fgao at openjdk.org (Fei Gao) Date: Tue, 25 Feb 2025 14:14:58 GMT Subject: RFR: 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands [v3] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 11:26:33 GMT, Fei Gao wrote: >> `IndOffXX` types don't do us any good. It would be simpler and faster to match a general-purpose `IndOff` type then let `legitimize_address()` fix any out-of-range operands. That'd reduce the size of the match rules and the time to run them. >> >> This patch simplifies the definitions of `immXOffset` with an estimated range. Whether an immediate can be encoded in a `LDR`/`STR` instructions as an offset will be determined in the phase of code-emitting. Meanwhile, we add necessary `legitimize_address()` in the phase of matcher for all `LDR`/`STR` instructions using the new `IndOff` memory operands (fix [JDK-8341437](https://bugs.openjdk.org/browse/JDK-8341437)). >> >> After this clean-up, memory operands matched with `IndOff` may require extra code emission (effectively a `lea`) before the address can be used. So we also modify the code about looking up precise offset of load/store instruction for implicit null check (fix [JDK-8340646](https://bugs.openjdk.org/browse/JDK-8340646)). On `aarch64` platform, we will use the beginning offset of the last instruction in the instruction clause emitted for a load/store machine node. Because `LDR`/`STR` is always the last one emitted, no matter what addressing mode the load/store operations finally use. >> >> Tier 1 - 3 passed on `Macos-aarch64` with or without the vm option `-XX:+UseZGC`. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into cleanup_indoff > - Update the copyright year and code comments > - Merge branch 'master' into cleanup_indoff > - 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands > > IndOffXX types don't do us any good. It would be simpler and > faster to match a general-purpose IndOff type then let > legitimize_address() fix any out-of-range operands. That'd > reduce the size of the match rules and the time to run them. > > This patch simplifies the definitions of `immXOffset` with an > estimated range. Whether an immediate can be encoded in a > LDR/STR instructions as an offset will be determined in the phase > of code-emitting. Meanwhile, we add necessary > `legitimize_address()` in the phase of matcher for all LDR/STR > instructions using the new `IndOff` memory operands > (fix JDK-8341437). > > After this clean-up, memory operands matched with `IndOff` may > require extra code emission (effectively a lea) before the address > can be used. So we also modify the code about looking up precise > offset of load/store instruction for implicit null check > (fix JDK-8340646). On aarch64 platform, we will use the beginning > offset of the last instruction in the instruction clause emitted > for a load/store machine node. Because LDR/STR is always the last > one emitted, no matter what addressing mode the load/store > operations finally use. > > Tier 1 - 3 passed on Macos-aarch64 with or without the vm option > "-XX:+UseZGC" The Tier1 failed on `macOS-aarch64` because of the wrong offset for implicit null check. I didn't find the same failures on my local full tests on macOS before. I'll figure out what caused this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22862#issuecomment-2682108532 From aph at openjdk.org Tue Feb 25 14:26:58 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 25 Feb 2025 14:26:58 GMT Subject: RFR: 8350483: AArch64: turn on signum intrinsics by default on Ampere CPUs In-Reply-To: References: <7XQsAZxrIwrsL3gPazBVzWnfQmfH3R6Xwnadg-9Jd34=.34b8e435-1d9f-4486-948e-70079238e3fd@github.com> Message-ID: On Tue, 25 Feb 2025 04:57:24 GMT, Patrick Zhang wrote: > > > Maybe we should think about removing the `UseSignumIntrinsic` flag altogether. > > > > > > Ah, the flag is also used by other ports. But it doesn't make much sense for us not to use the intrinsic. > > Thanks for your review @theRealAph. > > Yes, agree with you that we should turn on signum intrinsics (`-XX:+UseSignumIntrinsic`), probably `-XX:+UseCopySignIntrinsic` too. I had a larger test set on Neoverse-N1 and Ampere CPUs, and compared w/ vs w/o these two flags respectively, no obvious performance regression so far. `-XX:+UseSignumIntrinsic` can produce obvious benefits on `fmov` cases, while `-XX:+UseCopySignIntrinsic` can also bring +15% improvements to `*MathBench.signum*` tests when disabling the signum intrinsics. In Math.java, signum calls copySign, as such the benefit of copysign intrinsics would be hidden in a manner. Therefore, I think they two could be turned on altogether. A JBS ticket is required to track so. Here's one: [JDK-8350663](https://bugs.openjdk.org/browse/JDK-8350663) Are you interested in picking it up? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23735#issuecomment-2682151356 From galder at openjdk.org Tue Feb 25 14:57:05 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 25 Feb 2025 14:57:05 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 7 Feb 2025 12:39:24 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 44 additional commits since the last revision: > > - Merge branch 'master' into topic.intrinsify-max-min-long > - Fix typo > - Renaming methods and variables and add docu on algorithms > - Fix copyright years > - Make sure it runs with cpus with either avx512 or asimd > - Test can only run with 256 bit registers or bigger > > * Remove platform dependant check > and use platform independent configuration instead. > - Fix license header > - Tests should also run on aarch64 asimd=true envs > - Added comment around the assertions > - Adjust min/max identity IR test expectations after changes > - ... and 34 more: https://git.openjdk.org/jdk/compare/d6aa3453...a190ae68 > > The interesting thing is intReductionSimpleMin @ 100%. We see a regression there but I didn't observe it with the perfasm run. So, this could be due to variance in the application of cmov or not? > > I don't see the error / variance in the results you posted. Often I look at those, and if it is anywhere above 10% of the average, then I'm suspicious ;) > > Re: [#20098 (comment)](https://github.com/openjdk/jdk/pull/20098#issuecomment-2671144644) - I was trying to think what could be causing this. > > Maybe it is an issue with probabilities? Do you know at what point (if at all) the `MinI` node appears/disappears in that example? @eme64 I think you're in the right direction: minLongA = negate(maxLongA); minLongB = negate(maxLongB); minIntA = toInts(minLongA); minIntB = toInts(minLongB); To keep same data distribution algorithm for both min and max operations, I started with positive numbers for max and found out that I could use the same data with the same properties for min by negating them. As you can see in the above snippet, the min values for ints had not been negated. I'll fix that and show final numbers with the same subset shown in https://github.com/openjdk/jdk/pull/20098#issuecomment-2671144644 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2682263423 From tschatzl at openjdk.org Tue Feb 25 15:04:28 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 25 Feb 2025 15:04:28 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier Message-ID: Hi all, please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. ### Current situation With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. The main reason for the current barrier is how g1 implements concurrent refinement: * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: // Filtering if (region(@x.a) == region(y)) goto done; // same region check if (y == null) goto done; // null value check if (card(@x.a) == young_card) goto done; // write to young gen check StoreLoad; // synchronize if (card(@x.a) == dirty_card) goto done; *card(@x.a) = dirty // Card tracking enqueue(card-address(@x.a)) into thread-local-dcq; if (thread-local-dcq is not full) goto done; call runtime to move thread-local-dcq into dcqs done: Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a second card table ("refinement table"). The second card table also replaces the dirty card queue. In that scheme the fine-grained synchronization is unnecessary because mutator and refinement threads always write to different memory areas (and no concurrent write where an update can be lost can occur). This removes the necessity for synchronization for every reference write. Also no card enqueuing is required any more. Only the filters and the card mark remain. ### How this works In the beginning both the card table and the refinement table are completely unmarked (contain "clean" cards). The mutator dirties the card table, until G1 heuristics think that a significant enough amount of cards were dirtied based on what is allocated for scanning them during the garbage collection. At that point, the card table and the refinement table are exchanged "atomically" using handshakes. The mutator keeps dirtying the (the previous, clean refinement table which is now the) card table, while the refinement threads look for and refine dirty cards on the refinement table as before. Refinement of cards is very similar to before: if an interesting reference in a dirty card has been found, G1 records it in appropriate remembered sets. In this implementation there is an exception for references to the current collection set (typically young gen) - the refinement threads redirty that card on the card table with a special `to-collection-set` value. This is valid because races with the mutator for that write do not matter - the entire card will eventually be rescanned anyway, regardless of whether it ends up as dirty or to-collection-set. The advantage of marking to-collection-set cards specially is that the next time the card tables are swapped, the refinement threads will not re-refine them on the assumption that that reference to the collection set will not change. This decreases refinement work substantially. If refinement gets interrupted by GC, the refinement table will be merged with the card table before card scanning, which works as before. New barrier pseudo-code for an assignment `x.a = y`: // Filtering if (region(@x.a) == region(y)) goto done; // same region check if (y == null) goto done; // null value check if (card(@x.a) != clean_card) goto done; // skip already non-clean cards *card(@x.a) = dirty This is basically the Serial/Parallel GC barrier with additional filters to keep the number of dirty cards as little as possible. A few more comments about the barrier: * the barrier now loads the card table base offset from a thread local instead of inlining it. This is necessary for this mechanism to work as the card table to dirty changes over time, and may even be faster on some architectures (code size), and some architectures already do. * all existing pre-filters were kept. Benchmarks showed some significant regressions wrt to pause times and even throughput compared to G1 in master. Using the Parallel GC barrier (just the dirty card write) would be possible, and further investigation on stripping parts will be made as follow-up. * the final check tests for non-clean cards to avoid overwriting existing cards, in particular the "to-collection set" cards described above. Current G1 marks the cards corresponding to young gen regions as all "young" so that the original barrier could potentially avoid the `StoreLoad`. This implementation removes this facility (which might be re-introduced later), but measurements showed that pre-dirtying the young generation region's cards as "dirty" (g1 does not need to use an extra "young" value) did not yield any measurable performance difference. ### Refinement process The goal of the refinement (threads) is to make sure that the number of cards to scan in the garbage collection is below a particular threshold. The prototype changes the refinement threads into a single control thread and a set of (refinement) worker threads. Differently to the previous implementation, the control thread does not do any refinement, but only executes the heuristics to start a calculated amount of worker threads and tracking refinement progress. The refinement trigger is based on current known number of pending (i.e. dirty) cards on the card table and a pending card generation rate, fairly similarly to the previous algorithm. After the refinement control thread determines that it is time to do refinement, it starts the following sequence: 1) **Swap the card table**. This consists of several steps: 1) **Swap the global card table** - the global card table pointer is swapped; newly created threads and runtime calls will eventually use the new values, at the latest after the next two steps. 2) **Update the pointers in all JavaThread**'s TLS storage to the new card table pointer using a handshake operation 3) **Update the pointers in the GC thread**'s TLS storage to the new card table pointer using the SuspendibleThreadSet mechanism 2) **Snapshot the heap** - determine the extent of work needed for all regions where the refinement threads need to do some work on the refinement table (the previous card table). The snapshot stores the work progress for each region so that work can be interrupted and continued at any time. This work either consists of refinement of the particular card (old generation regions) or clearing the cards (next collection set/young generation regions). 3) **Sweep the refinement table** by activating the refinement worker threads. The threads refine dirty cards using the heap snapshot where worker threads claim parts of regions to process. * Cards with references to the young generation are not added to the young generation's card based remembered set. Instead these cards are marked as to-collection-set in the card table and any remaining refinement of that card skipped. * If refinement encounters a card that is already marked as to-collection-set it is not refined and re-marked as to-collection-set on the card table . * During refinement, the refinement table is also cleared (in bulk for collection set regions as they do not need any refinement, and in other regions as they are refined for the non-clean cards). * Dirty cards within unparsable heap areas are forwarded to/redirtied on the card table as is. 4) **Completion work**, mostly statistics. If the work is interrupted by a non-garbage collection synchronization point, work is suspended temporarily and resumed later using the heap snapshot. After the refinement process the refinement table is all-clean again and ready to be swapped again. ### Garbage collection pause changes Since a garbage collection (young or full gc) pause may occur at any point during the refinement process, the garbage collection needs some compensating work for the not yet swept parts of the refinement table. Note that this situation is very rare, and the heuristics try to avoid that, so in most cases nothing needs to be done as the refinement table is all clean. If this happens, young collections add a new phase called `Merge Refinement Table` in the garbage collection pause right before the `Merge Heap Roots` phase. This compensating phase does the following: 0) (Optional) Snapshot the heap if not done yet (if the process has been interrupted between state 1 and 3 of the refinement process) 1) Merge the refinement table into the card table - in this step the dirty cards of interesting regions are 2) Completion work (statistics) If a full collection interrupts concurrent refinement, the refinement table is simply cleared and all dirty cards thrown away. A garbage collection generates new cards (e.g. references from promoted objects into the young generation) on the refinement table. This acts similarly to the extra DCQS used to record these interesting references/cards and redirty the card table using them in the previous implementation. G1 swaps the card tables at the end of the collection to keep the post-condition of the refinement table being all clean (and any to-be-refined cards on the card table) at the end of garbage collection. ### Performance metrics Following is an overview of the changes in behavior. Some numbers are provided in the CR in the first comment. #### Native memory usage The refinement table takes an additional 0.2% of the Java heap size of native memory compared to JDK 21 and above (in JDK 21 we removed one card table sized data structure, so this is a non-issue when updating from before). Some of that additional memory usage is automatically reclaimed by removing the dirty card queues. Additional memory is reclaimed by managing the cards containing to-collection-set references on the card table by dropping the explicit remembered sets for young generation completely and any remembered set entries which would otherwise be duplicated into the other region's remembered sets. In some applications/benchmarks these gains completely offset the additional card table, however most of the time this is not the case, particularly for throughput applications currently. It is possible to allocate the refinement table lazily, which means that since these applications often do not need any concurrent refinement, there is no overhead at all but actually a net reduction of native memory usage. This is not implemented in this prototype. #### Latency ("Pause times") Not affected or slightly better. Pause times decrease due to a shorter "Merge remembered sets" phase due to no work required for the remembered sets for the young generation - they are always already on the card table! However merging of the refinement table into the card table is extremely fast and is always faster than merging remembered sets for the young gen in my measurements. Since this work is linearly scanning some memory, this is embarassingly parallel too. The cards created during garbage collection do not need to be redirtied, so that phase has also been removed. The card table swap is based on predictions for mutator card dirtying rate and refinement rate as before, and the policy is actually fairly similar to before. It is still rather aggressive, but in most cases takes less cpu resources than the one before, mostly because refining takes less cpu time. Many applications do not do any refinement at all like before. More investigation could be done to improve this in the future. #### Throughput This change always increases throughput in my measurements, depending on benchmark/application it may not actually show up in scores though. Due to the pre-barrier and the additional filters in the barrier G1 is still slower than Parallel on raw throughput benchmarks, but is typically somewhere half-way to Parallel GC or closer. ### Platform support Since the post write barrier changed, additional work for some platforms is required to allow this change to proceed. At this time all work for all platforms is done, but needs testing - GraalVM (contributed by the GraalVM team) - S390 (contributed by A. Kumar from IBM) - PPC (contributed by M. Doerr, from SAP) - ARM (should work, HelloWorld compiles and runs) - RISCV (should work, HelloWorld compiles and runs) - x86 (should work, build/HelloWorld compiles and runs) None of the above mentioned platforms implement the barrier method to write cards for a reference array (aarch64 and x64 are fully implemented), they call the runtime as before. I believe it is doable fairly easily now with this simplified barrier for some extra performance, but not necessary. ### Alternatives The JEP text extensively discusses alternatives. ### Reviewing The change can be roughly divided in these fairly isolated parts * platform specific changes to the barrier * refinement and refinement control thread changes; this is best reviewed starting from the `G1ConcurrentRefineThread::run_service` method * changes to garbage collection: `merge_refinement_table()` in `g1RemSet.cpp` * policy modifications are typically related to code around the calls to `G1Policy::record_dirtying_stats`. Further information is available in the [JEP draft](https://bugs.openjdk.org/browse/JDK-8340827); there is also an a bit more extensive discussion of the change on my [blog](https://tschatzl.github.io/2025/02/21/new-write-barriers.html). Some additional comments: * the pre-marking of young generation cards has been removed. Benchmarks did not show any significant difference either way. To me this makes somewhat sense because the entire young gen will quickly get marked anyway. I.e. one only saves a single additional card table write (for every card). With the old barrier the costs for a card table mark has been much higher. * G1 sets `UseCondCardMark` to true by default. The conditional card mark corresponds to the third filter in the write barrier now, and since I decided to keep all filters for this change, it makes sense to directly use this mechanism. If there are any questions, feel free to ask. Testing: tier1-7 (multiple tier1-7, tier1-8 with slightly older versions) Thanks, Thomas ------------- Commit messages: - * only provide byte map base for JavaThreads - * mdoerr review: fix comments in ppc code - * fix crash when writing dirty cards for memory regions during card table switching - * remove mention of "enqueue" or "enqueuing" for actions related to post barrier - * remove some commented out debug code - Card table as DCQ Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342382 Stats: 6543 lines in 103 files changed: 2162 ins; 3461 del; 920 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From mdoerr at openjdk.org Tue Feb 25 15:04:29 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 25 Feb 2025 15:04:29 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier In-Reply-To: References: Message-ID: On Sun, 23 Feb 2025 18:53:33 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... PPC64 code looks great! Thanks for doing this! Only some comments are no longer correct. src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp line 244: > 242: > 243: __ xorr(R0, store_addr, new_val); // tmp1 := store address ^ new value > 244: __ srdi_(R0, R0, G1HeapRegion::LogOfHRGrainBytes); // tmp1 := ((store address ^ new value) >> LogOfHRGrainBytes) Comment: R0 is used instead of tmp1 src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp line 259: > 257: > 258: __ ld(tmp1, G1ThreadLocalData::card_table_base_offset(), thread); > 259: __ srdi(tmp2, store_addr, CardTable::card_shift()); // tmp1 := card address relative to card table base Comment: tmp2 is used, here src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp line 261: > 259: __ srdi(tmp2, store_addr, CardTable::card_shift()); // tmp1 := card address relative to card table base > 260: if (UseCondCardMark) { > 261: __ lbzx(R0, tmp1, tmp2); // tmp1 := card address Can you remove the comment, please? It's wrong. ------------- PR Review: https://git.openjdk.org/jdk/pull/23739#pullrequestreview-2637143540 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1967669777 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1967670850 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1967671593 From duke at openjdk.org Tue Feb 25 15:04:29 2025 From: duke at openjdk.org (Piotr Tarsa) Date: Tue, 25 Feb 2025 15:04:29 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier In-Reply-To: References: Message-ID: On Sun, 23 Feb 2025 18:53:33 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... in this pr you've wrote if (region(@x.a) != region(y)) goto done; // same region check but on https://tschatzl.github.io/2025/02/21/new-write-barriers.html you wrote: (1) if (region(x.a) == region(y)) goto done; // Ignore references within the same region/area i guess the second one is correct ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2677075290 From stuefe at openjdk.org Tue Feb 25 15:04:29 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Feb 2025 15:04:29 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier In-Reply-To: References: Message-ID: On Sun, 23 Feb 2025 18:53:33 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... @tschatzl I did not contribute the ppc port. Did you mean @TheRealMDoerr or @reinrich ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2677512780 From tschatzl at openjdk.org Tue Feb 25 15:13:43 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 25 Feb 2025 15:13:43 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v2] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * remove unnecessarily added logging ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/0100d8e2..9ef9c5f4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=00-01 Stats: 4 lines in 4 files changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From vpaprotski at openjdk.org Tue Feb 25 15:20:56 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 25 Feb 2025 15:20:56 GMT Subject: RFR: 8350516: Update model numbers for ECore-based cpus In-Reply-To: References: Message-ID: On Fri, 21 Feb 2025 21:47:45 GMT, Volodymyr Paprotski wrote: > Add two more models to the list Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23731#issuecomment-2682340989 From duke at openjdk.org Tue Feb 25 15:20:56 2025 From: duke at openjdk.org (duke) Date: Tue, 25 Feb 2025 15:20:56 GMT Subject: RFR: 8350516: Update model numbers for ECore-based cpus In-Reply-To: References: Message-ID: On Fri, 21 Feb 2025 21:47:45 GMT, Volodymyr Paprotski wrote: > Add two more models to the list @vpaprotsk Your change (at version 7da4a1fe9441ceb8728ed8eae319e0d76417fda6) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23731#issuecomment-2682345418 From amitkumar at openjdk.org Tue Feb 25 15:39:21 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 25 Feb 2025 15:39:21 GMT Subject: RFR: 8349686: [s390x] C1: Improve Class.isInstance intrinsic [v10] In-Reply-To: References: Message-ID: > s390x implementation for Class.isInstance intrinsic. > > Tier1 test on release & fastdebug vm are clean with flag: `-XX:-UseSecondarySupersCache -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`. > > Benchmark results will be updated soon. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/s390/c1_Runtime1_s390.cpp Co-authored-by: Andrew Haley ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23535/files - new: https://git.openjdk.org/jdk/pull/23535/files/ffdb1342..e7269045 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23535&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23535/head:pull/23535 PR: https://git.openjdk.org/jdk/pull/23535 From gziemski at openjdk.org Tue Feb 25 15:39:36 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 25 Feb 2025 15:39:36 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v56] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: fix Linux arm build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/4a7edefb..28f2076e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=55 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=54-55 Stats: 11 lines in 1 file changed: 0 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From gziemski at openjdk.org Tue Feb 25 15:44:11 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 25 Feb 2025 15:44:11 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v31] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Tue, 25 Feb 2025 11:06:26 GMT, Afshin Zafari wrote: >> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. >> - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. >> - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. >> - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > reviews applied. How would I go about verifying the performance gain? You mentioned previously that you wrote a microbenchmark for testing this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2682420600 From stuefe at openjdk.org Tue Feb 25 15:49:32 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Feb 2025 15:49:32 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v5] In-Reply-To: References: Message-ID: > Greetings, > > This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. > > Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. > > I wanted to track that information correctly and display it clearly in a way that is easy to understand. > > The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). > > The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. > > The statistic gives us two new forms of output: > > 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: > > > Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: > Phase Total ra node comp type index reglive regsplit cienv other > none 1205512 155104 982984 33712 0 0 0 0 0 33712 > parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 > optimizer 916584 0 556416 0 0 0 0 0 0 360168 > escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 > connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 > macroEliminate 196448 0 196448 0 0 0 0 0 0 0 > iterGVN 327440 0 196368 131072 0 0 0 0 0 0 > incrementalInline 3992816 0 3043704 621832 0 0 0 0 261824... Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/chaitin.cpp Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23530/files - new: https://git.openjdk.org/jdk/pull/23530/files/dd7a06ad..2c56b216 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23530/head:pull/23530 PR: https://git.openjdk.org/jdk/pull/23530 From duke at openjdk.org Tue Feb 25 16:00:57 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Tue, 25 Feb 2025 16:00:57 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: <1yB95sOajuS5ptFI0GQWLepii5JsZ9DOsje-TEFyFYs=.a325ad18-17ed-4e77-b1e3-0bad2cf55c67@github.com> Message-ID: <_CekdxBJviS_sZCVN62_yFx-cTF4qrIuAnqbIeUmFck=.3a6afffb-8fbe-4809-a4ca-1bc22b52a628@github.com> On Tue, 25 Feb 2025 13:50:35 GMT, Andrew Haley wrote: >> I just tried it with top-of trunk latest binutils: >> >> fedora:aarch64 $ ~/binutils-gdb-install/bin/as -march=armv9-a+sha3+sve2-bitperm aarch64ops.s >> fedora:aarch64 $ ~/binutils-gdb-install/bin/as --version >> GNU assembler (GNU Binutils) 2.44.50.20250225 > > Aha! > > > aph at Andrews-MacBook-Pro ~ % as t.s > t.s:1:19: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] > sub x1, x10, x23, sxth #2 > ^ > aph at Andrews-MacBook-Pro ~ % as --version > Apple clang version 16.0.0 (clang-1600.0.26.6) > Target: arm64-apple-darwin24.3.0 OK, so GNU as is more forgiving than Apple as... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1970076152 From stuefe at openjdk.org Tue Feb 25 16:18:57 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Feb 2025 16:18:57 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v4] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 05:57:28 GMT, Ashutosh Mehra wrote: >> Thomas Stuefe has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> avoid Thread::current in high traffic chunk alloc path > > src/hotspot/share/compiler/compilationMemStatInternals.hpp line 92: > >> 90: >> 91: // A very simple fixed-width FIFO buffer, used for the phase timeline >> 92: template > > Would `size` be a better name than `max`? ok changed > src/hotspot/share/compiler/compilationMemStatInternals.hpp line 160: > >> 158: void init(T v) { start = cur = peak = v; } >> 159: void update(T v) { cur = v; if (v > peak) peak = v; } >> 160: dT end_delta() const { return (dT)cur - (dT)start; } > > Should it be `return (dT)(cur - start); }` hmm, I like to avoid the inner overflow is cur < start (if the phase released more memory than it allocated) > src/hotspot/share/compiler/compilationMemStatInternals.hpp line 161: > >> 159: void update(T v) { cur = v; if (v > peak) peak = v; } >> 160: dT end_delta() const { return (dT)cur - (dT)start; } >> 161: size_t temporary_peak_size() const { return MIN2(peak - cur, peak - start); } > > shouldn't it be `MAX2(peak - cur, peak - start)`? Why not just `peak - start`? We are only interested in a rise that rose significantly above **both** the start and end point of the measurements. E.g.: - if we have this: start = 0, end = 20MB, peak = 20MB, this is not a temporary peak and we already know that the end usage is 20MB. - if we have this: start = 20MB, end = 0, peak = 20MB, this is not a temporary peak either, because we already know the starting footprint was 20MB. - but if we have start = 0, end = 0, peak = 20MB, this is interesting since if we just print start and end we will miss the fact that in between those times we had temporarily allocated 20MB. > src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 149: > >> 147: int col = start_indent; >> 148: check_phase_trace_id(e.info.id); >> 149: if (omit_empty_phases && e._bytes.end_delta() == 0 && e._bytes.temporary_peak_size() == 0) { > > `omit_empty_phases` is always false. Can it be just removed? I hesitated to throw this out because the timeline can get very wordy; but I got used to the more expressive timeline with the 0 entries, so okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1970105585 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1970105238 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1970103141 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1970110006 From stuefe at openjdk.org Tue Feb 25 16:29:02 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Feb 2025 16:29:02 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v4] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 06:03:21 GMT, Ashutosh Mehra wrote: >> Thomas Stuefe has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> avoid Thread::current in high traffic chunk alloc path > > src/hotspot/share/memory/arena.hpp line 48: > >> 46: const size_t _len; // Size of this Chunk >> 47: // Used for Compilation Memory Statistic >> 48: uint64_t _stamp; > > This is wasted space if compilation memory stats is not enabled. One way to avoid this is to subclass `Chunk` as a `StampedChunk` and use that if compilation memory stats is enabled. Is this complexity worth the space saving? I'd like to avoid that complexity. Arena coding is already needlessly complex. Note that a chunk is variable-sized and typically 1Kb in size or larger, and we should not have that many chunks in live arenas at any given moment (some hundred maybe). Note also that using 8 bytes seems wasteful, but the payload section has to be aligned to void* anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1970128193 From vpaprotski at openjdk.org Tue Feb 25 16:30:58 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 25 Feb 2025 16:30:58 GMT Subject: Integrated: 8350516: Update model numbers for ECore-based cpus In-Reply-To: References: Message-ID: On Fri, 21 Feb 2025 21:47:45 GMT, Volodymyr Paprotski wrote: > Add two more models to the list This pull request has now been integrated. Changeset: dea7a9f0 Author: Volodymyr Paprotski Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/dea7a9f0d640e5234bafe2157aecd942c71d5de5 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8350516: Update model numbers for ECore-based cpus Reviewed-by: sviswanathan, vaivanov ------------- PR: https://git.openjdk.org/jdk/pull/23731 From stuefe at openjdk.org Tue Feb 25 16:43:21 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Feb 2025 16:43:21 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v6] In-Reply-To: References: Message-ID: > Greetings, > > This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. > > Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. > > I wanted to track that information correctly and display it clearly in a way that is easy to understand. > > The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). > > The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. > > The statistic gives us two new forms of output: > > 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: > > > Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: > Phase Total ra node comp type index reglive regsplit cienv other > none 1205512 155104 982984 33712 0 0 0 0 0 33712 > parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 > optimizer 916584 0 556416 0 0 0 0 0 0 360168 > escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 > connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 > macroEliminate 196448 0 196448 0 0 0 0 0 0 0 > iterGVN 327440 0 196368 131072 0 0 0 0 0 0 > incrementalInline 3992816 0 3043704 621832 0 0 0 0 261824... Thomas Stuefe has updated the pull request incrementally with five additional commits since the last revision: - feedback ashu - feedback roberto - final-statistics-switch - performance fix - remove test code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23530/files - new: https://git.openjdk.org/jdk/pull/23530/files/2c56b216..3052ddf8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23530&range=04-05 Stats: 66 lines in 11 files changed: 9 ins; 22 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/23530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23530/head:pull/23530 PR: https://git.openjdk.org/jdk/pull/23530 From stuefe at openjdk.org Tue Feb 25 16:43:21 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Feb 2025 16:43:21 GMT Subject: RFR: 8344009: Improve compiler memory statistics In-Reply-To: References: <0wHGNSlwe7cWb7Plad2n8Swy8rayYTAf5IETuw9zl4U=.a4d6a129-aebc-4639-aaef-92ee6c4552c7@github.com> Message-ID: <5UAbfNQNxn--W_diVazFvldvScMGE59vVfpWJ4GUziA=.24361d75-77c9-418d-830e-797cac10f4b9@github.com> On Mon, 24 Feb 2025 08:56:51 GMT, Roberto Casta?eda Lozano wrote: >>> @robcasloz I identified and hopefully fixed a small issue that hit the "disabled" path. Turns out we allocate arena chunks a lot more frequently than I thought, and the new unconditional call to Thread::current() in there was hurting a bit. I now avoid this unless I know the statistic is enabled. >>> >>> With this patch, on my machine the difference between unpatched and patched JVM with stats disabled is below one standard deviation for the benchmark in question. >> >> Great, thanks! Will re-run benchmarking and report results early next week. > >> > @robcasloz I identified and hopefully fixed a small issue that hit the "disabled" path. Turns out we allocate arena chunks a lot more frequently than I thought, and the new unconditional call to Thread::current() in there was hurting a bit. I now avoid this unless I know the statistic is enabled. >> > With this patch, on my machine the difference between unpatched and patched JVM with stats disabled is below one standard deviation for the benchmark in question. >> >> Great, thanks! Will re-run benchmarking and report results early next week. > > Functional test results (Oracle tier1-5) still look good for the latest commit (dd7a06ad). I can confirm that the C2 speed regression on our linux-x64 machines is almost fully mitigated. The 2-3% regression on our macosx-aarch64 machines does not seem to be addressed by the latest changes though, but as I mentioned before I think it is in the acceptable range (and only affects one benchmark). @robcasloz, @ashu-mehra thanks a lot for your reviews. I incorporated most of them into the PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2682609306 From stuefe at openjdk.org Tue Feb 25 16:43:22 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Feb 2025 16:43:22 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v4] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 16:13:59 GMT, Thomas Stuefe wrote: >> src/hotspot/share/compiler/compilationMemStatInternals.hpp line 92: >> >>> 90: >>> 91: // A very simple fixed-width FIFO buffer, used for the phase timeline >>> 92: template >> >> Would `size` be a better name than `max`? > > ok changed done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1970154273 From stuefe at openjdk.org Tue Feb 25 16:43:22 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Feb 2025 16:43:22 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v4] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 06:34:29 GMT, Ashutosh Mehra wrote: >> Thomas Stuefe has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> avoid Thread::current in high traffic chunk alloc path > > src/hotspot/share/opto/phase.hpp line 125: > >> 123: f( _t_temporaryTimer1, "tempTimer1") \ >> 124: f( _t_temporaryTimer2, "tempTimer2") \ >> 125: f( _t_testTimer1, "testTimer1") \ > > Would `_t_testPhase1` and `_t_testPhase2` be a better name? okay ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1970155311 From sroy at openjdk.org Tue Feb 25 16:44:05 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Tue, 25 Feb 2025 16:44:05 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v26] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> <7qzgn1LeDY8CaNJZVRPb0FORbKbkfBP85qrU3MSH_Io=.62e89bae-a5e2-47e3-8217-d83ab7bef00f@github.com> Message-ID: On Fri, 21 Feb 2025 19:54:07 GMT, Martin Doerr wrote: >> Hi @TheRealMDoerr Maybe my answer was not clear. I am not proposing to remove them. I am unable to decipher how to reduce the 3 instructions to one, as I feel the below 2 lines are required , as per the algorithm. >> __ vec_perm(vTmp4, vHigh, vHigh, loadOrder); >> __ vec_perm(vTmp5, vLow, vLow, loadOrder); > > The purpose of the 3 `vec_perm` instructions is to extract 16 Bytes from two 16 Byte values loaded into vector registers. This can be done by 1 `vec_perm` instruction. But I think AIX should get fixed first before we figure out how to determine the vPerm value for that (probably lvsl + vxor before the loop). @TheRealMDoerr I understood the failure on AIX. It is related to this. vec_perm(vH, vTmp5, vTmp4, vPerm)- Here we combine first and last 16 bytes and extract 16 bytes out of them using the pattern generated by lvsl in vPerm. We required the 2 extra vec_perm,specifically, for Linux on Power , so that order of elements is retained, else we will end up selecting the wrong 16bytes . For Linux we need vec_perm(vH, vTmp5, vTmp4, vPerm); ...for AIX it would be vec_perm(vH, vTmp4, vTmp5, vPerm); without the need for the 2 vec_perm statements, as the order is retained due to Endianness. I am trying to find a pattern that can eliminate the need to do 2 extra vec_perm for Linux on Power. One thing I tried was __ xxspltib(vTmp12->to_vsr(), 31); __ vxor(vPerm, vPerm, vTmp12); This generates the sequence of bytes ,required for Little Endian. Some test cases did pass, but some failed too. Still working on it. Let me know your inputs too. If the above explanation is not clear, let me know, I will try to explain with an example ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1970157770 From mdoerr at openjdk.org Tue Feb 25 16:52:01 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 25 Feb 2025 16:52:01 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v26] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> <7qzgn1LeDY8CaNJZVRPb0FORbKbkfBP85qrU3MSH_Io=.62e89bae-a5e2-47e3-8217-d83ab7bef00f@github.com> Message-ID: On Tue, 25 Feb 2025 16:41:30 GMT, Suchismith Roy wrote: >> The purpose of the 3 `vec_perm` instructions is to extract 16 Bytes from two 16 Byte values loaded into vector registers. This can be done by 1 `vec_perm` instruction. But I think AIX should get fixed first before we figure out how to determine the vPerm value for that (probably lvsl + vxor before the loop). > > @TheRealMDoerr > I understood the failure on AIX. It is related to this. > > vec_perm(vH, vTmp5, vTmp4, vPerm)- Here we combine first and last 16 bytes and extract 16 bytes out of them using the pattern generated by lvsl in vPerm. > > We required the 2 extra vec_perm,specifically, for Linux on Power , so that order of elements is retained, else we will end up selecting the wrong 16bytes . > > For Linux we need vec_perm(vH, vTmp5, vTmp4, vPerm); ...for AIX it would be vec_perm(vH, vTmp4, vTmp5, vPerm); without the need for the 2 vec_perm statements, as the order is retained due to Endianness. > > I am trying to find a pattern that can eliminate the need to do 2 extra vec_perm for Linux on Power. > > One thing I tried was > __ xxspltib(vTmp12->to_vsr(), 31); > __ vxor(vPerm, vPerm, vTmp12); > This generates the sequence of bytes ,required for Little Endian. > Some test cases did pass, but some failed too. Still working on it. Let me know your inputs too. > > If the above explanation is not clear, let me know, I will try to explain with an example I'll wait for the AIX fix and make experiments on both platforms after that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1970171149 From kvn at openjdk.org Tue Feb 25 17:32:02 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Feb 2025 17:32:02 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v4] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 09:27:13 GMT, Emanuel Peter wrote: >> Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below. >> >> **Background** >> >> With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer. >> >> **Problem** >> >> So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code. >> >> >> MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1); >> MemorySegment nativeUnaligned = nativeAligned.asSlice(1); >> test3(nativeUnaligned); >> >> >> When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not! >> >> static void test3(MemorySegment ms) { >> for (int i = 0; i < RANGE; i++) { >> long adr = i * 4L; >> int v = ms.get(ELEMENT_LAYOUT, adr); >> ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1)); >> } >> } >> >> >> **Solution: Runtime Checks - Predicate and Multiversioning** >> >> Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check. >> >> I came up with 2 options where to place the runtime checks: >> - A new "auto vectorization" Parse Predicate: >> - This only works when predicates are available. >> - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop. >> - Multiversion the loop: >> - Create 2 copies of the loop (fast and slow loops). >> - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take >> - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even ... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 66 commits: > > - Merge branch 'master' into JDK-8323582-SW-native-alignment > - stall -> delay, plus some more comments > - adjust selector if probability > - Merge branch 'master' into JDK-8323582-SW-native-alignment > - remove multiversion mark if we break the structure > - register opaque with igvn > - copyright and rm CFG check > - IR rules for all cases > - 3 test versions > - test changed to unaligned ints > - ... and 56 more: https://git.openjdk.org/jdk/compare/d551daca...8eb52292 This looks good for me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22016#pullrequestreview-2641927937 From kvn at openjdk.org Tue Feb 25 17:32:02 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Feb 2025 17:32:02 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: <9mXRl7rScxJwxNNlV_H1gxndtzZ6g-gE8cMsc6VsTJQ=.b5a77c13-6e7e-4203-898a-3318e298d30f@github.com> Message-ID: <_pnjKfnS2e4hYWJ5_y8CudFAOmKB7FrD8cad8wCfZus=.16ac819a-2a99-4a8b-9640-3fa3bde53970@github.com> On Tue, 25 Feb 2025 07:09:24 GMT, Emanuel Peter wrote: > > PS: "slow" path implies that it is not taking frequently and it should not affect general performance of application. > > For me "slow" just means less optimized, because some assumption does not hold. The "fast" path is faster, because it has more assumptions and can optimize more (i.e. vectorize in this case, or vectorize more instructions). Do you have a better name than "fast/slow"? I think I nit-picked here. I see your good comments in `loopTransform.cpp` and loop `node.hpp` explaining mutiversioning fast_loop/slow_loop. I think it is fine to keep "slow/fast". We can use "uncommon" to indicate unfrequent path. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2682745643 From shade at openjdk.org Tue Feb 25 18:18:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Feb 2025 18:18:02 GMT Subject: RFR: 8328473: StringTable and SymbolTable statistics delay time to safepoint [v2] In-Reply-To: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> References: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> Message-ID: On Mon, 24 Feb 2025 18:41:28 GMT, Coleen Phillimore wrote: >> This change adds a safepoint poll to gathering statistics for the Symbol and String tables, using the ConcurrentHashTableTasks to chunk up the walk. The stringTable and symbolTable is similar, like the GrowTask and DeleteTask code. Maybe this can be cleaned up but I don't have a good idea about that yet that doesn't involve yet another level of templated functions and code. This is already pretty highly templatized. >> Tested with tier1-4 and runThese internal test with JFR and failure injection to verify that we do try to safepoint while gathering statistics. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fxi typo. Marked as reviewed by shade (Reviewer). I realized my `jcmd` suggestion would not work, because it _itself_ runs in `VMThread`, so we miss an opportunity to stall another pending (GC) safepoint. Easiest way to reproduce this is to go to `lib/jfr/default.jfc` and set the low period: true 100ms Then run this: public class GC { static final int COUNT = 5_000_000; static String[] STRS = new String[COUNT]; public static void main(String... args) throws Exception { for (int c = 0; c < COUNT; c++) { STRS[c] = "String" + c; STRS[c].intern(); } while (true) { System.gc(); Thread.sleep(100); } } } $ build/linux-x86_64-server-release/images/jdk/bin/java -Xmx1g -XX:+UseParallelGC -XX:StartFlightRecording=filename=100us.jfr -Xlog:safepoint GC.java Before the patch: [28.594s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100409233 ns, Reaching safepoint: 63522678 ns, At safepoint: 35075399 ns, Total: 98598077 ns [28.830s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100408313 ns, Reaching safepoint: 99500774 ns, At safepoint: 35990128 ns, Total: 135490902 ns [29.064s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100403672 ns, Reaching safepoint: 99545475 ns, At safepoint: 34358513 ns, Total: 133903988 ns [29.259s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100392112 ns, Reaching safepoint: 60340691 ns, At safepoint: 33913938 ns, Total: 94254629 ns [29.405s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100350133 ns, Reaching safepoint: 3830 ns, At safepoint: 45870373 ns, Total: 45874203 ns [29.540s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100391972 ns, Reaching safepoint: 4240 ns, At safepoint: 34334753 ns, Total: 34338993 ns [29.739s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100378902 ns, Reaching safepoint: 64600069 ns, At safepoint: 34248220 ns, Total: 98848289 ns [29.985s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100398184 ns, Reaching safepoint: 99533655 ns, At safepoint: 46089494 ns, Total: 145623149 ns [30.232s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100396804 ns, Reaching safepoint: 99533335 ns, At safepoint: 47408568 ns, Total: 146941903 ns [30.407s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100339512 ns, Reaching safepoint: 39161103 ns, At safepoint: 35173080 ns, Total: 74334183 ns After the patch: [29.159s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100415121 ns, Reaching safepoint: 3470 ns, At safepoint: 34602873 ns, Total: 34606343 ns [29.294s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100364263 ns, Reaching safepoint: 77200 ns, At safepoint: 34833386 ns, Total: 34910586 ns [29.429s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100424341 ns, Reaching safepoint: 77051 ns, At safepoint: 34416483 ns, Total: 34493534 ns [29.564s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100380481 ns, Reaching safepoint: 76160 ns, At safepoint: 34529093 ns, Total: 34605253 ns [29.699s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100446433 ns, Reaching safepoint: 6670 ns, At safepoint: 34552893 ns, Total: 34559563 ns [29.834s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100372811 ns, Reaching safepoint: 79071 ns, At safepoint: 34600173 ns, Total: 34679244 ns [29.968s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100372983 ns, Reaching safepoint: 3560 ns, At safepoint: 34080189 ns, Total: 34083749 ns [30.104s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100361950 ns, Reaching safepoint: 78221 ns, At safepoint: 35596762 ns, Total: 35674983 ns [30.240s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100448723 ns, Reaching safepoint: 6560 ns, At safepoint: 34830036 ns, Total: 34836596 ns I think it shows the current claiming strategy is good enough to mitigate TTSP stalls. ------------- PR Review: https://git.openjdk.org/jdk/pull/23750#pullrequestreview-2642038463 PR Comment: https://git.openjdk.org/jdk/pull/23750#issuecomment-2682902510 From coleenp at openjdk.org Tue Feb 25 18:35:07 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Feb 2025 18:35:07 GMT Subject: RFR: 8328473: StringTable and SymbolTable statistics delay time to safepoint [v2] In-Reply-To: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> References: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> Message-ID: On Mon, 24 Feb 2025 18:41:28 GMT, Coleen Phillimore wrote: >> This change adds a safepoint poll to gathering statistics for the Symbol and String tables, using the ConcurrentHashTableTasks to chunk up the walk. The stringTable and symbolTable is similar, like the GrowTask and DeleteTask code. Maybe this can be cleaned up but I don't have a good idea about that yet that doesn't involve yet another level of templated functions and code. This is already pretty highly templatized. >> Tested with tier1-4 and runThese internal test with JFR and failure injection to verify that we do try to safepoint while gathering statistics. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fxi typo. Thanks for writing the experiment. I can confirm that the times to safepoint after the patch don't have the long delays than the times before the patch, like: [31.078s][info][safepoint ] Safepoint "ParallelGCCollect", Time since last: 100220803 ns, Reaching safepoint: 116514226 ns, At safepoint: 56694466 ns, Total: 173208692 ns ------------- PR Comment: https://git.openjdk.org/jdk/pull/23750#issuecomment-2682939272 From shade at openjdk.org Tue Feb 25 19:08:04 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Feb 2025 19:08:04 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 Message-ID: See bug for description of the bug. Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. I think the solution is to keep storing a cached modifiers field in `Klass` instead of relying on Java mirror being accessible. Unfortunately, this patch undoes the removal of `u2` field from `Klass` done in [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567). Additional testing: - [x] Linux x86_64 server fastdebug, original reproducer now passes - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` - [x] Linux x86_64 server fastdebug, `jdk_jfr` - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes ------------- Commit messages: - Re-introduce Klass cache - Fix Changes: https://git.openjdk.org/jdk/pull/23775/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23775&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350649 Stats: 18 lines in 3 files changed: 14 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23775.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23775/head:pull/23775 PR: https://git.openjdk.org/jdk/pull/23775 From shade at openjdk.org Tue Feb 25 19:08:04 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Feb 2025 19:08:04 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 13:00:52 GMT, Aleksey Shipilev wrote: > See bug for description of the bug. > > Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. > > I think the solution is to keep storing a cached modifiers field in `Klass` instead of relying on Java mirror being accessible. Unfortunately, this patch undoes the removal of `u2` field from `Klass` done in [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567). > > Additional testing: > - [x] Linux x86_64 server fastdebug, original reproducer now passes > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes @coleenp, [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567) is yours, want to take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23775#issuecomment-2683005360 From gziemski at openjdk.org Tue Feb 25 19:35:42 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 25 Feb 2025 19:35:42 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v57] In-Reply-To: References: Message-ID: <1b_ZtPMyaF7yZ3SJOIBGpIR9dT1Gprke1hXSSKVBaoo=.3cae569a-8c71-42ed-baf7-fd6d84a951b5@github.com> > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: fix Linux arm build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23115/files - new: https://git.openjdk.org/jdk/pull/23115/files/28f2076e..366d2b57 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=56 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=55-56 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From bkilambi at openjdk.org Tue Feb 25 19:45:31 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 25 Feb 2025 19:45:31 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations [v2] In-Reply-To: References: Message-ID: <8QDbenZGakijqUrwAcaVogoJBEiNpzYhN3sDrrteSDk=.d8539631-ab03-45ff-a762-0b6e14c63f89@github.com> > This patch adds aarch64 backend for scalar FP16 operations namely - add, subtract, multiply, divide, fma, sqrt, min and max. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23748/files - new: https://git.openjdk.org/jdk/pull/23748/files/a608a035..4d699740 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23748&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23748&range=00-01 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23748.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23748/head:pull/23748 PR: https://git.openjdk.org/jdk/pull/23748 From coleenp at openjdk.org Tue Feb 25 19:48:59 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Feb 2025 19:48:59 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 In-Reply-To: References: Message-ID: <0PMWVfR5hVc2Djx8dFGQaTChvgJoIdJsNnyq9JWyWms=.d9c9c0b7-493e-436a-9538-bce5e4042514@github.com> On Tue, 25 Feb 2025 13:00:52 GMT, Aleksey Shipilev wrote: > See bug for description of the bug. Shenandoah seems to be the only GC that runs into this problem so far. > > Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. > > I think the solution is to keep storing a cached modifiers field in `Klass` instead of relying on Java mirror being accessible. Unfortunately, this patch undoes the removal of `u2` field from `Klass` done in [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567). > > Additional testing: > - [x] Linux x86_64 server fastdebug, original reproducer now passes > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes No, we don't really want a duplicate of this field. We should just recompute them from the access flags since they don't change. If you're dubious about the value being the same make cached_modifier_flags be DEBUG_ONLY() and compare against compute_modifier_flags() result. It's only JFR that looks at modifier_flags after the mirror is dead, if I understand correctly. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23775#pullrequestreview-2642271821 PR Review: https://git.openjdk.org/jdk/pull/23775#pullrequestreview-2642277143 From bkilambi at openjdk.org Tue Feb 25 19:49:01 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 25 Feb 2025 19:49:01 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations [v2] In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 17:06:59 GMT, Andrew Haley wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/cpu/aarch64/aarch64.ad line 17275: > >> 17273: >> 17274: // This pattern would result in the following instructions (the first two are for ConvF2HF >> 17275: // and the last instruction is for ReinterpretS2HF) - > > Suggestion: > > // Without this pattern, (ReinterpretS2HF (ConvF2HF src)) would result in the following instructions (the first two for ConvF2HF > // and the last instruction for ReinterpretS2HF) - > > Reads a little better, I think? Addressed this in the new patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23748#discussion_r1970437734 From bkilambi at openjdk.org Tue Feb 25 19:48:59 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 25 Feb 2025 19:48:59 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations [v2] In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 17:42:05 GMT, Bhavana Kilambi wrote: >> src/hotspot/cpu/aarch64/aarch64.ad line 6978: >> >>> 6976: // ldr instruction has 32/64/128 bit variants but not a 16-bit variant. This >>> 6977: // loads the 16-bit value from constant pool into a 32-bit register but only >>> 6978: // the bottom half will be populated. >> >> Surely what actually happens here is that it loads a 32-bit word from the constant pool. The bottom 16 bits of this word contain the half-precision constant, the top 16 bits are zero. > > I agree. The wording didn't quite convey that. I will change it in my next PS. Thank you for looking into the patch! Addressed this in the new patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23748#discussion_r1970437283 From coleenp at openjdk.org Tue Feb 25 19:58:00 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Feb 2025 19:58:00 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 13:00:52 GMT, Aleksey Shipilev wrote: > See bug for description of the bug. Shenandoah seems to be the only GC that runs into this problem so far. > > Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. > > I think the solution is to keep storing a cached modifiers field in `Klass` instead of relying on Java mirror being accessible. Unfortunately, this patch undoes the removal of `u2` field from `Klass` done in [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567). > > Additional testing: > - [x] Linux x86_64 server fastdebug, original reproducer now passes > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes ciEnv shouldn't be looking at modifier_flags from a dead mirror, though? If it's possible, then you can add modifier_flags to ciKlass to cache the value there. I wouldn't be unhappy with that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23775#issuecomment-2683144852 From shade at openjdk.org Tue Feb 25 19:57:59 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Feb 2025 19:57:59 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 In-Reply-To: <0PMWVfR5hVc2Djx8dFGQaTChvgJoIdJsNnyq9JWyWms=.d9c9c0b7-493e-436a-9538-bce5e4042514@github.com> References: <0PMWVfR5hVc2Djx8dFGQaTChvgJoIdJsNnyq9JWyWms=.d9c9c0b7-493e-436a-9538-bce5e4042514@github.com> Message-ID: <46T5TlAtUDBpFkLg5VI7aGJw9eaAc6XrDcc4jkgUPlg=.2f1e68e4-d716-4396-81a2-8d743867f017@github.com> On Tue, 25 Feb 2025 19:46:05 GMT, Coleen Phillimore wrote: > It's only JFR that looks at modifier_flags after the mirror is dead, if I understand correctly. `ciEnv` looks at `modifier_flags` as well. I can introduce `modifier_flags_slow` to use on JFR path only. Let's see... ------------- PR Comment: https://git.openjdk.org/jdk/pull/23775#issuecomment-2683139849 From shade at openjdk.org Tue Feb 25 20:06:32 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Feb 2025 20:06:32 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 [v2] In-Reply-To: References: Message-ID: > See bug for description of the bug. Shenandoah seems to be the only GC that runs into this problem so far. > > Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. > > I think the solution is to keep storing a cached modifiers field in `Klass` instead of relying on Java mirror being accessible. Unfortunately, this patch undoes the removal of `u2` field from `Klass` done in [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567). > > Additional testing: > - [x] Linux x86_64 server fastdebug, original reproducer now passes > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Alternate fix: compute modifiers directly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23775/files - new: https://git.openjdk.org/jdk/pull/23775/files/7269b441..c302a4a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23775&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23775&range=00-01 Stats: 22 lines in 4 files changed: 3 ins; 14 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23775.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23775/head:pull/23775 PR: https://git.openjdk.org/jdk/pull/23775 From shade at openjdk.org Tue Feb 25 20:06:32 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Feb 2025 20:06:32 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 13:00:52 GMT, Aleksey Shipilev wrote: > See bug for description of the bug. Shenandoah seems to be the only GC that runs into this problem so far. > > Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. > > I think the solution is to keep storing a cached modifiers field in `Klass` instead of relying on Java mirror being accessible. Unfortunately, this patch undoes the removal of `u2` field from `Klass` done in [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567). > > Additional testing: > - [x] Linux x86_64 server fastdebug, original reproducer now passes > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes So I think JFR can just call `compute_modifier_flags()` directly, without relying on Java mirror. I added the blurb around the method to point out it is safer to do from unloading paths. See new version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23775#issuecomment-2683162828 From shade at openjdk.org Tue Feb 25 20:11:32 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Feb 2025 20:11:32 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 [v3] In-Reply-To: References: Message-ID: <50O756emqs_uRsrvur1qGOoei1zIuutr38pNiT4xYIo=.7bae1e0d-9c54-4e06-a2ad-599502584b7e@github.com> > See bug for description of the bug. Shenandoah seems to be the only GC that runs into this problem so far. > > Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. > > I think the solution is to keep storing a cached modifiers field in `Klass` instead of relying on Java mirror being accessible. Unfortunately, this patch undoes the removal of `u2` field from `Klass` done in [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567). > > Additional testing: > - [x] Linux x86_64 server fastdebug, original reproducer now passes > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Better comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23775/files - new: https://git.openjdk.org/jdk/pull/23775/files/c302a4a2..f49438c8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23775&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23775&range=01-02 Stats: 9 lines in 1 file changed: 5 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23775.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23775/head:pull/23775 PR: https://git.openjdk.org/jdk/pull/23775 From shade at openjdk.org Tue Feb 25 20:17:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Feb 2025 20:17:51 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 [v4] In-Reply-To: References: Message-ID: > See bug for description of the bug. Shenandoah seems to be the only GC that runs into this problem so far. > > Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. > > I think the solution is to keep storing a cached modifiers field in `Klass` instead of relying on Java mirror being accessible. Unfortunately, this patch undoes the removal of `u2` field from `Klass` done in [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567). > > Additional testing: > - [x] Linux x86_64 server fastdebug, original reproducer now passes > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: More comment polishing, getting too late here for doing this without three commits in the row ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23775/files - new: https://git.openjdk.org/jdk/pull/23775/files/f49438c8..8a77e589 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23775&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23775&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23775.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23775/head:pull/23775 PR: https://git.openjdk.org/jdk/pull/23775 From coleenp at openjdk.org Tue Feb 25 20:17:51 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Feb 2025 20:17:51 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 [v4] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 20:14:23 GMT, Aleksey Shipilev wrote: >> See bug for description of the bug. Shenandoah seems to be the only GC that runs into this problem so far. >> >> Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. >> >> I think the solution is to keep storing a cached modifiers field in `Klass` instead of relying on Java mirror being accessible. Unfortunately, this patch undoes the removal of `u2` field from `Klass` done in [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567). >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, original reproducer now passes >> - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > More comment polishing, getting too late here for doing this without three commits in the row Yes, this looks good. computer_modifier_flags() should get the same answer whenever it's called. Your comment looks good. JFR do_write_class during unloading is something that's been tricky in the past, but I hope there is nothing else that accesses the mirror when it's dead and the class should be unloaded. Was there a reproducer that can be added with this change? I assume ZGC could have the same sort of problem. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23775#pullrequestreview-2642359731 From shade at openjdk.org Tue Feb 25 20:19:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Feb 2025 20:19:54 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 [v4] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 20:13:36 GMT, Coleen Phillimore wrote: > Was there a reproducer that can be added with this change? I assume ZGC could have the same sort of problem. Existing `jdk/jfr` tests fail with Shenandoah reliably (see bug for example invocation), so I felt no need for a new regression test. [JDK-8337978](https://bugs.openjdk.org/browse/JDK-8337978) also helps us to verify we touch the valid Java mirror. There is a larger reproducer in [JDK-8350580](https://bugs.openjdk.org/browse/JDK-8350580), but I am still not 100% sure there are no other bugs it triggers. Once I am sure, I'll add some form of that as regression test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23775#issuecomment-2683190245 From coleenp at openjdk.org Tue Feb 25 20:53:58 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Feb 2025 20:53:58 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 [v4] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 20:17:51 GMT, Aleksey Shipilev wrote: >> See bug for description of the bug. Shenandoah seems to be the only GC that runs into this problem so far. >> >> Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. >> >> I think the solution is to keep storing a cached modifiers field in `Klass` instead of relying on Java mirror being accessible. Unfortunately, this patch undoes the removal of `u2` field from `Klass` done in [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567). >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, original reproducer now passes >> - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > More comment polishing, getting too late here for doing this without three commits in the row If existing tests fail reliably, then there's no need to add a new test. Looks like JDK-8337978 has proven its value! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23775#issuecomment-2683259660 From gziemski at openjdk.org Tue Feb 25 20:59:01 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 25 Feb 2025 20:59:01 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v57] In-Reply-To: <1b_ZtPMyaF7yZ3SJOIBGpIR9dT1Gprke1hXSSKVBaoo=.3cae569a-8c71-42ed-baf7-fd6d84a951b5@github.com> References: <1b_ZtPMyaF7yZ3SJOIBGpIR9dT1Gprke1hXSSKVBaoo=.3cae569a-8c71-42ed-baf7-fd6d84a951b5@github.com> Message-ID: On Tue, 25 Feb 2025 19:35:42 GMT, Gerard Ziemski wrote: >> Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. >> >> Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` >> >> Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): >> >> >> malloc summary: >> >> time:8,951,473[ns] [samples:117,717] >> memory requested:28,474,918 bytes, allocated:29,904,416 bytes, >> malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] >> >> NMT type: objects: bytes: time: count%: bytes%: time%: overhead: >> ------------------------------------------------------------------------------------------------------------------------- >> Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? >> Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? >> Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? >> Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? >> Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? >> GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? >> GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? >> Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? >> JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? >> Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? >> Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? >> Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? >> Native Memory Tracking: 367 30,736 17... > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > fix Linux arm build Moving to a new PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23115#issuecomment-2683274882 From gziemski at openjdk.org Tue Feb 25 21:52:13 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 25 Feb 2025 21:52:13 GMT Subject: RFR: 8317453: NMT: Performance benchmarks are needed to measure speed and memory [v58] In-Reply-To: References: Message-ID: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... Gerard Ziemski has updated the pull request with a new target base due to a merge or a rebase. ------------- Changes: https://git.openjdk.org/jdk/pull/23115/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23115&range=57 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23115/head:pull/23115 PR: https://git.openjdk.org/jdk/pull/23115 From gziemski at openjdk.org Tue Feb 25 21:52:13 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 25 Feb 2025 21:52:13 GMT Subject: Withdrawn: 8317453: NMT: Performance benchmarks are needed to measure speed and memory In-Reply-To: References: Message-ID: On Tue, 14 Jan 2025 19:12:45 GMT, Gerard Ziemski wrote: > Here is another, hopefully, closer to the final iteration of NMT benchmarking mechanism. > > Please see the design document attached to the issue for details - `NMTBenchmark design document.pages.pdf` > > Here is a sample output (don't forget to scroll all the way right to see the malloc byte size mini histograms!): > > > malloc summary: > > time:8,951,473[ns] [samples:117,717] > memory requested:28,474,918 bytes, allocated:29,904,416 bytes, > malloc overhead=1,429,498 bytes [5.02%], NMT headers overhead=2,118,906 bytes [7.44%] > > NMT type: objects: bytes: time: count%: bytes%: time%: overhead: > ------------------------------------------------------------------------------------------------------------------------- > Java Heap: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Class: 8,598 727,856 607,047 7.3% 2.4% 6.8% 18.2% ?????????? > Thread: 196 68,256 64,875 0.2% 0.2% 0.7% 7.0% ?????????? > Thread Stack: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Code: 10,094 2,036,528 916,348 8.6% 6.8% 10.2% 9.9% ?????????? > GC: 1,813 20,372,160 1,214,642 1.5% 68.1% 13.6% 3.7% ?????????? > GCCardSet: 299 28,736 13,174 0.3% 0.1% 0.1% 11.6% ?????????? > Compiler: 55 13,728 171,364 0.0% 0.0% 1.9% 6.9% ?????????? > JVMCI: 0 0 0 0.0% 0.0% 0.0% 0.0% ?????????? > Internal: 5,066 339,184 1,418,578 4.3% 1.1% 15.8% 18.0% ?????????? > Other: 6 244,736 21,303 0.0% 0.8% 0.2% 37.9% ?????????? > Symbol: 9,844 1,493,280 752,665 8.4% 5.0% 8.4% 14.1% ?????????? > Native Memory Tracking: 367 30,736 17,654 0.3% 0.1% 0.2% 7... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23115 From ccheung at openjdk.org Tue Feb 25 22:48:01 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 25 Feb 2025 22:48:01 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v8] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 01:11:25 GMT, Ioi Lam wrote: >> Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. >> >> With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. >> >> To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): >> >>> the format of the configuration and cache files is not specified and is subject to change without notice. >> >> **Notes for reviewers:** >> >> - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. >> - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. >> - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. >> - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. >> - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. >> >> **Misc Note** >> - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccb... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - all tests in runtime/cds/appcds/aotClassLinking should be excluded for hotspot_appcds_dynamic testing > - @ashu-mehra comment - simplified _archived_cpp_vtptrs; also fixed old comments near by > - Merge branch 'master' into 8348426-binary-aot-config-file > - Merge branch 'master' into 8348426-binary-aot-config-file > - @ashu-mehra comments > - @calvinccheung comments > - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself > - Improve error message when AOTMode=create has an incompatible classpath > - Fixed test cases @vnkozlov > - Update "make test JTREG_AOT_JDK=true ..." to work with binary AOT configuration > - ... and 5 more: https://git.openjdk.org/jdk/compare/990d40e9...1ec67c11 Marked as reviewed by ccheung (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23484#pullrequestreview-2642634878 From iklam at openjdk.org Tue Feb 25 22:59:09 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 25 Feb 2025 22:59:09 GMT Subject: Integrated: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 05:58:51 GMT, Ioi Lam wrote: > Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. > > With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. > > To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): > >> the format of the configuration and cache files is not specified and is subject to change without notice. > > **Notes for reviewers:** > > - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. > - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. > - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. > - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. > - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. > > **Misc Note** > - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccba6dcce4a3) will be integrated separ... This pull request has now been integrated. Changeset: 86024ebd Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/86024ebdb0f06517925c03e52246fbda0bad8f7c Stats: 1231 lines in 42 files changed: 1014 ins; 47 del; 170 mod 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file Reviewed-by: ccheung, asmehra, kvn, iveresov ------------- PR: https://git.openjdk.org/jdk/pull/23484 From iklam at openjdk.org Tue Feb 25 22:59:08 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 25 Feb 2025 22:59:08 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v2] In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 15:28:30 GMT, Vladimir Kozlov wrote: >> Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update "make test JTREG_AOT_JDK=true ..." to work with binary AOT configuration >> - Fixed test failures > > tools/javac/ImplicitClass/ImplicitImports.java failed in GHA: > > [0.002s][warning][cds] Unable to use AOT cache: CDS is disabled when java.base module is patched. > Hello, World! > Exception running test testImplicitSimpleIOImport: java.lang.AssertionError: Incorrect Output, expected: [Hello, World!], actual: [[0.002s][warning][cds] Unable to use AOT cache: CDS is disabled when java.base module is patched., Hello, World!] > java.lang.AssertionError: Incorrect Output, expected: [Hello, World!], actual: [[0.002s][warning][cds] Unable to use AOT cache: CDS is disabled when java.base module is patched., Hello, World!] > at ImplicitImports.testImplicitSimpleIOImport(ImplicitImports.java:171) Thanks @vnkozlov @veresov @calvinccheung @veresov for the review. I re-ran tests with tiers 1-5, plus extra AOT-enabled tests for jtreg and jck. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23484#issuecomment-2683474589 From dlong at openjdk.org Wed Feb 26 00:49:00 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 26 Feb 2025 00:49:00 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 [v4] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 20:17:51 GMT, Aleksey Shipilev wrote: >> See bug for description of the bug. Shenandoah seems to be the only GC that runs into this problem so far. >> >> Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. >> >> I think the solution is to keep storing a cached modifiers field in `Klass` instead of relying on Java mirror being accessible. Unfortunately, this patch undoes the removal of `u2` field from `Klass` done in [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567). >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, original reproducer now passes >> - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > More comment polishing, getting too late here for doing this without three commits in the row src/hotspot/share/oops/klass.hpp line 755: > 753: int modifier_flags() const; > 754: > 755: // Compute modifier flags from the original data. This is also allows Suggestion: // Compute modifier flags from the original data. This also allows ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23775#discussion_r1970737877 From mpowers at openjdk.org Wed Feb 26 01:03:52 2025 From: mpowers at openjdk.org (Mark Powers) Date: Wed, 26 Feb 2025 01:03:52 GMT Subject: RFR: 8349721: Add aarch64 intrinsics for ML-KEM In-Reply-To: References: Message-ID: On Mon, 17 Feb 2025 13:53:30 GMT, Ferenc Rakoczi wrote: > By using the aarch64 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. ML-KEM benchmark results of this PR: MLKEM.decapsulate 512 11.80 us/op MLKEM.decapsulate 768 18.19 us/op MLKEM.decapsulate 1024 29.57 us/op MLKEM.encapsulate 512 8.80 us/op MLKEM.encapsulate 768 13.49 us/op MLKEM.encapsulate 1024 22.53 us/op MLKEM.keygen 512 7.49 us/op MLKEM.keygen 768 11.22 us/op MLKEM.keygen 1024 19.08 us/op ML-KEM no intrinsics MLKEM.decapsulate 512 31.23 us/op MLKEM.decapsulate 768 50.09 us/op MLKEM.decapsulate 1024 75.92 us/op MLKEM.encapsulate 512 22.72 us/op MLKEM.encapsulate 768 37.27 us/op MLKEM.encapsulate 1024 59.69 us/op MLKEM.keygen 512 17.95 us/op MLKEM.keygen 768 30.95 us/op MLKEM.keygen 1024 49.04 us/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/23663#issuecomment-2683631601 From egahlin at openjdk.org Wed Feb 26 01:58:53 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 26 Feb 2025 01:58:53 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 [v4] In-Reply-To: References: Message-ID: <0Wx216vwUgC_wJEw0XDGYs_QJImVZ8y4sdwHlzlawjc=.379d3e42-c319-4ff9-b379-74be426095ac@github.com> On Tue, 25 Feb 2025 20:17:51 GMT, Aleksey Shipilev wrote: >> See bug for description of the bug. Shenandoah seems to be the only GC that runs into this problem so far. >> >> Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. >> >> I think the solution is to keep storing a cached modifiers field in `Klass` instead of relying on Java mirror being accessible. Unfortunately, this patch undoes the removal of `u2` field from `Klass` done in [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567). >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, original reproducer now passes >> - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > More comment polishing, getting too late here for doing this without three commits in the row Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23775#pullrequestreview-2642839986 From qpzhang at openjdk.org Wed Feb 26 03:05:54 2025 From: qpzhang at openjdk.org (Patrick Zhang) Date: Wed, 26 Feb 2025 03:05:54 GMT Subject: RFR: 8350483: AArch64: turn on signum intrinsics by default on Ampere CPUs In-Reply-To: References: <7XQsAZxrIwrsL3gPazBVzWnfQmfH3R6Xwnadg-9Jd34=.34b8e435-1d9f-4486-948e-70079238e3fd@github.com> Message-ID: <9gJN--VtSNWhuCw-SXWucM2TedMH5DsfRwS0BYe-8GY=.f922b4f5-af95-4e8b-8a74-928788b0d050@github.com> On Tue, 25 Feb 2025 14:23:43 GMT, Andrew Haley wrote: > > Here's one: [JDK-8350663](https://bugs.openjdk.org/browse/JDK-8350663) Are you interested in picking it up? Sure, I will take it. BTW, could you please help sponsor this commit, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23735#issuecomment-2683777970 From asmehra at openjdk.org Wed Feb 26 04:20:57 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 26 Feb 2025 04:20:57 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v4] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 16:13:48 GMT, Thomas Stuefe wrote: >> src/hotspot/share/compiler/compilationMemStatInternals.hpp line 160: >> >>> 158: void init(T v) { start = cur = peak = v; } >>> 159: void update(T v) { cur = v; if (v > peak) peak = v; } >>> 160: dT end_delta() const { return (dT)cur - (dT)start; } >> >> Should it be `return (dT)(cur - start); }` > > hmm, I like to avoid the inner overflow is cur < start (if the phase released more memory than it allocated) right, I missed that. >> src/hotspot/share/compiler/compilationMemStatInternals.hpp line 161: >> >>> 159: void update(T v) { cur = v; if (v > peak) peak = v; } >>> 160: dT end_delta() const { return (dT)cur - (dT)start; } >>> 161: size_t temporary_peak_size() const { return MIN2(peak - cur, peak - start); } >> >> shouldn't it be `MAX2(peak - cur, peak - start)`? Why not just `peak - start`? > > We are only interested in a rise that rose significantly above **both** the start and end point of the measurements. > > E.g.: > - if we have this: start = 0, end = 20MB, peak = 20MB, this is not a temporary peak and we already know that the end usage is 20MB. > - if we have this: start = 20MB, end = 0, peak = 20MB, this is not a temporary peak either, because we already know the starting footprint was 20MB. > - but if we have start = 0, end = 0, peak = 20MB, this is interesting since if we just print start and end we will miss the fact that in between those times we had temporarily allocated 20MB. Thanks for the explanation. It would be great if some comment can be added, possibly along with some example like the one in the previous comment, to explain the meaning of `temporary_peak_size` and the corresponding calculation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1970880396 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1970880439 From amitkumar at openjdk.org Wed Feb 26 04:28:26 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 26 Feb 2025 04:28:26 GMT Subject: RFR: 8350716: [s390] intrinsify Thread.currentThread() Message-ID: s390x port for [JDK-8278793](https://bugs.openjdk.org/browse/JDK-8278793) ------------- Commit messages: - currentThread Intrinsic Changes: https://git.openjdk.org/jdk/pull/23791/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23791&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350716 Stats: 13 lines in 1 file changed: 12 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23791.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23791/head:pull/23791 PR: https://git.openjdk.org/jdk/pull/23791 From asmehra at openjdk.org Wed Feb 26 04:29:54 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 26 Feb 2025 04:29:54 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v6] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 16:43:21 GMT, Thomas Stuefe wrote: >> Greetings, >> >> This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. >> >> Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. >> >> I wanted to track that information correctly and display it clearly in a way that is easy to understand. >> >> The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). >> >> The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. >> >> The statistic gives us two new forms of output: >> >> 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: >> >> >> Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: >> Phase Total ra node comp type index reglive regsplit cienv other >> none 1205512 155104 982984 33712 0 0 0 0 0 33712 >> parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 >> optimizer 916584 0 556416 0 0 0 0 0 0 360168 >> escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 >> connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 >> macroEliminate 196448 0 196448 0 0 0 0 0 0 0 >> iterGVN 327440 0 196368 131072 0 0 0 0 0 0 >> incrementalInline 3992816 0 3043704 62... > > Thomas Stuefe has updated the pull request incrementally with five additional commits since the last revision: > > - feedback ashu > - feedback roberto > - final-statistics-switch > - performance fix > - remove test code Marked as reviewed by asmehra (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23530#pullrequestreview-2643027285 From asmehra at openjdk.org Wed Feb 26 04:29:55 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 26 Feb 2025 04:29:55 GMT Subject: RFR: 8344009: Improve compiler memory statistics In-Reply-To: <5UAbfNQNxn--W_diVazFvldvScMGE59vVfpWJ4GUziA=.24361d75-77c9-418d-830e-797cac10f4b9@github.com> References: <0wHGNSlwe7cWb7Plad2n8Swy8rayYTAf5IETuw9zl4U=.a4d6a129-aebc-4639-aaef-92ee6c4552c7@github.com> <5UAbfNQNxn--W_diVazFvldvScMGE59vVfpWJ4GUziA=.24361d75-77c9-418d-830e-797cac10f4b9@github.com> Message-ID: On Tue, 25 Feb 2025 16:39:14 GMT, Thomas Stuefe wrote: >>> > @robcasloz I identified and hopefully fixed a small issue that hit the "disabled" path. Turns out we allocate arena chunks a lot more frequently than I thought, and the new unconditional call to Thread::current() in there was hurting a bit. I now avoid this unless I know the statistic is enabled. >>> > With this patch, on my machine the difference between unpatched and patched JVM with stats disabled is below one standard deviation for the benchmark in question. >>> >>> Great, thanks! Will re-run benchmarking and report results early next week. >> >> Functional test results (Oracle tier1-5) still look good for the latest commit (dd7a06ad). I can confirm that the C2 speed regression on our linux-x64 machines is almost fully mitigated. The 2-3% regression on our macosx-aarch64 machines does not seem to be addressed by the latest changes though, but as I mentioned before I think it is in the acceptable range (and only affects one benchmark). > > @robcasloz, @ashu-mehra thanks a lot for your reviews. I incorporated most of them into the PR. Changes look good to me. Thanks @tstuefe for addressing the comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2683863222 From dholmes at openjdk.org Wed Feb 26 07:02:12 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 26 Feb 2025 07:02:12 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 16:29:25 GMT, Fredrik Bredberg wrote: > I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. > > This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. > > In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. > > The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. > > You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. > > The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. > > Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). > > Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. > > However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fact that c2 no longer has to check b... Disclaimer for other reviewers, I have been looking at this code for some time now. Overall code looks good. I have quite a few comments/suggestions about comments. I suggest renaming `_vthread_cxq_head` to just `_vthread_head` as the `cxq` part is no longer meaningful. I agree that even though this seems performance neutral, the code simplification (for people reading it for the first time) will be worth it. Thanks. src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 331: > 329: volatile_nonstatic_field(ObjectMonitor, _owner, int64_t) \ > 330: volatile_nonstatic_field(ObjectMonitor, _recursions, intptr_t) \ > 331: volatile_nonstatic_field(ObjectMonitor, _entry_list, ObjectWaiter*) \ Suggestion: volatile_nonstatic_field(ObjectMonitor, _entry_list, ObjectWaiter*) \ Extra space src/hotspot/share/runtime/objectMonitor.cpp line 166: > 164: // its next pointer, and have its prev pointer set to null. Thus > 165: // pushing six threads A-F (in that order) onto entry_list, will > 166: // form a singly-linked list, see 1) below. Suggestion: have diagram 1 immediately follow this text so the reader doesn't have to jump down. src/hotspot/share/runtime/objectMonitor.cpp line 172: > 170: // from the entry_list head. While walking the list we also assign > 171: // the prev pointers of each thread, essentially forming a doubly > 172: // linked list, see 2) below. Suggestion: have diagram 2 immediately follow this text so the reader doesn't have to jump down. src/hotspot/share/runtime/objectMonitor.cpp line 176: > 174: // Once we have formed a doubly linked list it's easy to find the > 175: // successor, wake it up, have it remove itself, and update the > 176: // tail pointer, as seen in 2) and 3) below. Suggestion: // tail pointer, as seen in 3) below. But have diagram 3 right here. src/hotspot/share/runtime/objectMonitor.cpp line 179: > 177: // > 178: // At any time new threads can add themselves to the entry_list, see > 179: // 4) and 5). Diagrams 4 and 5 do not follow from what has just been described, but the use of "at any time" implies to me you intended to show them affecting the queue as we have already seen it. Again show the diagram you want here. src/hotspot/share/runtime/objectMonitor.cpp line 183: > 181: // If the thread that removes itself from the end of the list hasn't > 182: // got any prev pointer, we just set the tail pointer to null, see > 183: // 5) and 6). Suggestion: // If the thread to be removed is the only thread in the entry list: // entry_list -> A -> null // entry_list_tail ---^ // we remove it and just set the tail pointer to null, // entry_list -> null // entry_list_tail -> null src/hotspot/share/runtime/objectMonitor.cpp line 187: > 185: // Next time we need to find the successor and the tail is null, we > 186: // just start walking from the entry_list head again forming a new > 187: // doubly linked list, see 6) and 7) below. Suggestion: // Next time we need to find the successor and the tail is null, // entry_list ->I->H->G->null // entry_list_tail ->null // we just start walking from the entry_list head again forming a new // doubly linked list: // entry_list ->I<=>H<=>G->null // entry_list_tail ----------^ src/hotspot/share/runtime/objectMonitor.cpp line 189: > 187: // doubly linked list, see 6) and 7) below. > 188: // > 189: // 1) entry_list ->F->E->D->C->B->A->null Suggestion: // 1) entry_list ->F->E->D->C->B->A->null Right-justify the names please src/hotspot/share/runtime/objectMonitor.cpp line 215: > 213: // The mutex property of the monitor itself protects the entry_list > 214: // from concurrent interference. > 215: // -- Only the monitor owner may detach nodes from the entry_list. Suggestion for this block - get rid of invariants headings and just say: // The monitor itself protects all of the operations on the entry_list except for the CAS of a new arrival // to the head. Only the monitor owner can read or write the prev links (e.g. to remove itself) or update // the tail. src/hotspot/share/runtime/objectMonitor.cpp line 225: > 223: // concurrent detaching thread. This mechanism is immune from the > 224: // ABA corruption. More precisely, the CAS-based "push" onto > 225: // entry_list is ABA-oblivious. Not sure this actually says anything to help people understand the code or its operation. There basically is no A-B-A issue with the use of CAS here. src/hotspot/share/runtime/objectMonitor.cpp line 227: > 225: // entry_list is ABA-oblivious. > 226: // > 227: // * The entry_list form a queue of threads stalled trying to acquire Suggestion: // * The entry_list forms a queue of threads stalled trying to acquire src/hotspot/share/runtime/objectMonitor.cpp line 232: > 230: // thread notices that the tail of the entry_list is not known, we > 231: // convert the singly-linked entry_list into a doubly linked list by > 232: // assigning the prev pointers and the entry_list_tail pointer. Didn't we essentially say all this at the beginning? src/hotspot/share/runtime/objectMonitor.cpp line 260: > 258: // > 259: // * notify() or notifyAll() simply transfers threads from the WaitSet > 260: // to either the entry_list. Subsequent exit() operations will Suggestion: // to the entry_list. Subsequent exit() operations will src/hotspot/share/runtime/objectMonitor.cpp line 704: > 702: > 703: for (;;) { > 704: ObjectWaiter* front = Atomic::load(&_entry_list); In comments and code pick "head" or "front" to use to describe what _entry_list points to and use that consistently. I think "front" is much more common. src/hotspot/share/runtime/objectMonitor.cpp line 705: > 703: for (;;) { > 704: ObjectWaiter* front = Atomic::load(&_entry_list); > 705: No need for blank line. src/hotspot/share/runtime/objectMonitor.cpp line 718: > 716: // if we added current to _entry_list. Once on _entry_list, current > 717: // stays on-queue until it acquires the lock. > 718: bool ObjectMonitor::try_lock_or_add_to_entry_list(JavaThread* current, ObjectWaiter* node) { Nit: the name suggests we do the try_lock first, when we don't. If we reverse the name we should also reverse the true/false return so that true relates to the first part of the name. See what others think. src/hotspot/share/runtime/objectMonitor.cpp line 719: > 717: // stays on-queue until it acquires the lock. > 718: bool ObjectMonitor::try_lock_or_add_to_entry_list(JavaThread* current, ObjectWaiter* node) { > 719: node->_prev = nullptr; Shouldn't this already be the case? src/hotspot/share/runtime/objectMonitor.cpp line 724: > 722: for (;;) { > 723: ObjectWaiter* front = Atomic::load(&_entry_list); > 724: No need for blank line. src/hotspot/share/runtime/objectMonitor.cpp line 731: > 729: > 730: // Interference - the CAS failed because _entry_list changed. Just retry. > 731: // As an optional optimization we retry the lock. Suggestion: // Interference - the CAS failed because _entry_list changed. Before // retrying the CAS retry taking the lock as it may now be free. src/hotspot/share/runtime/objectMonitor.cpp line 812: > 810: guarantee(_entry_list == nullptr, > 811: "must be no entering threads: entry_list=" INTPTR_FORMAT, > 812: p2i(_entry_list)); Mustn't re-read _entry_list in the p2i as it may have changed from the value that is causing the guarantee to fail. The old guarantees were buggy in this regard - a temp is needed. src/hotspot/share/runtime/objectMonitor.cpp line 1299: > 1297: assert(_entry_list_tail == nullptr || _entry_list_tail == currentNode, "invariant"); > 1298: > 1299: ObjectWaiter* v = Atomic::load(&_entry_list); Nit: use `w` to be consistent with similar code. The original used `w` for EntryList and `v` for cxq IIRC. src/hotspot/share/runtime/objectMonitor.cpp line 2018: > 2016: // that in prepend-mode we invert the order of the waiters. Let's say that the > 2017: // waitset is "ABCD" and the entry_list is "XYZ". After a notifyAll() in prepend > 2018: // mode the waitset will be empty and the entry_list will be "DCBAXYZ". We don't support different ordering modes any more so we always "prepend" such that waiters are added to the entry_list in the reverse order of waiting. So given waitList -> A -> B -> C -> D, and _entry_list -> x -> y -> z we will get _entry_list -> D -> C -> B -> A -> X -> Y -> Z src/hotspot/share/runtime/objectMonitor.hpp line 195: > 193: volatile intx _recursions; // recursion count, 0 for first entry > 194: ObjectWaiter* volatile _entry_list; // Threads blocked on entry or reentry. > 195: // The list is actually composed of WaitNodes, Suggestion: // The list is actually composed of wait-nodes, Pre-existing (check for other uses) `WaitNodes` reads like a class name but it isn't. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23421#pullrequestreview-2643098063 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970923830 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970940771 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970940914 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970941662 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970936929 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970946641 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970948581 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970934947 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970956573 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970965071 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970965291 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970966451 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970967237 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970971522 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970968581 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970975419 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970976144 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970976457 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970977990 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970979335 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970982964 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1971037645 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1970926134 From syan at openjdk.org Wed Feb 26 07:30:27 2025 From: syan at openjdk.org (SendaoYan) Date: Wed, 26 Feb 2025 07:30:27 GMT Subject: RFR: 8350723: RISC-V: debug.cpp help() is missing riscv line for pns Message-ID: Hi all, This PR add RISC-V entry line for pns() call in src/hotspot/share/utilities/debug.cpp file. This will be useful when call help() function in gdb. No risk. ------------- Commit messages: - 8350723: RISC-V: debug.cpp help() is missing riscv line for pns Changes: https://git.openjdk.org/jdk/pull/23793/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23793&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350723 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23793.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23793/head:pull/23793 PR: https://git.openjdk.org/jdk/pull/23793 From shade at openjdk.org Wed Feb 26 07:35:31 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Feb 2025 07:35:31 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 [v4] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 00:46:37 GMT, Dean Long wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> More comment polishing, getting too late here for doing this without three commits in the row > > src/hotspot/share/oops/klass.hpp line 755: > >> 753: int modifier_flags() const; >> 754: >> 755: // Compute modifier flags from the original data. This is also allows > > Suggestion: > > // Compute modifier flags from the original data. This also allows Done! Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23775#discussion_r1971073234 From shade at openjdk.org Wed Feb 26 07:35:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Feb 2025 07:35:30 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 [v5] In-Reply-To: References: Message-ID: > See bug for description of the bug. Shenandoah seems to be the only GC that runs into this problem so far. > > Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. > > I think the solution is to keep storing a cached modifiers field in `Klass` instead of relying on Java mirror being accessible. Unfortunately, this patch undoes the removal of `u2` field from `Klass` done in [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567). > > Additional testing: > - [x] Linux x86_64 server fastdebug, original reproducer now passes > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Drop "is" ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23775/files - new: https://git.openjdk.org/jdk/pull/23775/files/8a77e589..0275ed1f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23775&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23775&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23775.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23775/head:pull/23775 PR: https://git.openjdk.org/jdk/pull/23775 From fyang at openjdk.org Wed Feb 26 07:51:52 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 26 Feb 2025 07:51:52 GMT Subject: RFR: 8350723: RISC-V: debug.cpp help() is missing riscv line for pns In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 07:24:48 GMT, SendaoYan wrote: > Hi all, > > This PR add RISC-V entry line for pns() call in src/hotspot/share/utilities/debug.cpp file. This will be useful when call help() function in gdb. No risk. Looks good and trivial. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23793#pullrequestreview-2643450082 From eosterlund at openjdk.org Wed Feb 26 08:21:53 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 26 Feb 2025 08:21:53 GMT Subject: RFR: 8328473: StringTable and SymbolTable statistics delay time to safepoint [v2] In-Reply-To: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> References: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> Message-ID: On Mon, 24 Feb 2025 18:41:28 GMT, Coleen Phillimore wrote: >> This change adds a safepoint poll to gathering statistics for the Symbol and String tables, using the ConcurrentHashTableTasks to chunk up the walk. The stringTable and symbolTable is similar, like the GrowTask and DeleteTask code. Maybe this can be cleaned up but I don't have a good idea about that yet that doesn't involve yet another level of templated functions and code. This is already pretty highly templatized. >> Tested with tier1-4 and runThese internal test with JFR and failure injection to verify that we do try to safepoint while gathering statistics. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fxi typo. This looks nice! Thanks for fixing this. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23750#pullrequestreview-2643537855 From haosun at openjdk.org Wed Feb 26 08:30:55 2025 From: haosun at openjdk.org (Hao Sun) Date: Wed, 26 Feb 2025 08:30:55 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations [v2] In-Reply-To: <8QDbenZGakijqUrwAcaVogoJBEiNpzYhN3sDrrteSDk=.d8539631-ab03-45ff-a762-0b6e14c63f89@github.com> References: <8QDbenZGakijqUrwAcaVogoJBEiNpzYhN3sDrrteSDk=.d8539631-ab03-45ff-a762-0b6e14c63f89@github.com> Message-ID: On Tue, 25 Feb 2025 19:45:31 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend for scalar FP16 operations namely - add, subtract, multiply, divide, fma, sqrt, min and max. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2097: > 2095: > 2096: // Half-precision floating-point instructions > 2097: INSN(fabdh, 0b011, 0b11, 0b000101, 0b0); I suppose `fadbh` and `fnmulh` are added to keep aligned with the float and double ones, i.e. `fabd(s|d)` and `fnmul(s|d)`. I noticed that there are matching rules for `fabd(s|d)`, i.e. `absd(F|D)_reg`. I wonder if we need add the corresponding rule for fp16 here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23748#discussion_r1971142347 From bkilambi at openjdk.org Wed Feb 26 08:52:53 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 26 Feb 2025 08:52:53 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations [v2] In-Reply-To: References: <8QDbenZGakijqUrwAcaVogoJBEiNpzYhN3sDrrteSDk=.d8539631-ab03-45ff-a762-0b6e14c63f89@github.com> Message-ID: On Wed, 26 Feb 2025 08:26:57 GMT, Hao Sun wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2097: > >> 2095: >> 2096: // Half-precision floating-point instructions >> 2097: INSN(fabdh, 0b011, 0b11, 0b000101, 0b0); > > I suppose `fadbh` and `fnmulh` are added to keep aligned with the float and double ones, i.e. `fabd(s|d)` and `fnmul(s|d)`. > > > I noticed that there are matching rules for `fabd(s|d)`, i.e. `absd(F|D)_reg`. I wonder if we need add the corresponding rule for fp16 here? Hi @shqking , thanks for your review comments. Yes I added `fabdh` and `fnmulh` to keep aligned with float and double types. For adding support for FP16 `absd` we need `AbsHF` to be supported (along with SubHF) but `AbsHF` node is not implemented currently. `abs` operation is directly executed from the java code here - https://github.com/openjdk/jdk/blob/037e47112bdf2fa2324f7c58198f6d433f17d9fd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java#L1464 and is not intrinsified or pattern matched like other FP16 operations. Same with `negate` operation for FP16 - https://github.com/openjdk/jdk/blob/037e47112bdf2fa2324f7c58198f6d433f17d9fd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java#L1449 On the Valhalla repo, while these operation were being developed, I tried adding support for `AbsHF/NegHF` which emitted `fabs` and `fneg` instructions but the performance with the direct java code(bit manipulation operations) was much faster (sorry don't remember the exact number) so we decided to go with the java implementation instead. I still added `fabd` here because `op21` is 0 only in `fabd` H variant and felt that it'd be better to handle it here as it belongs to this group of instructions. Please let me know your thoughts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23748#discussion_r1971175829 From qpzhang at openjdk.org Wed Feb 26 09:09:58 2025 From: qpzhang at openjdk.org (Patrick Zhang) Date: Wed, 26 Feb 2025 09:09:58 GMT Subject: Integrated: 8350483: AArch64: turn on signum intrinsics by default on Ampere CPUs In-Reply-To: References: Message-ID: <1B4TeZGOISz3pryE3j7-cYq06Uz6gnLJ_VAD1gnU5AY=.5f6da3c5-7e18-4a73-8d73-0692bf297d04@github.com> On Sat, 22 Feb 2025 15:27:41 GMT, Patrick Zhang wrote: > Set -XX:+UseSignumIntrinsic by default for Ampere CPUs. It is to fix performance problem found on JMH cases `vm.compiler.Signum|java.lang.*MathBench.sig[nN]um*` where fmov is used to transmit data between GPRs and FPRs with significant time cost. > > Verified on Ampere-1A and found the scores (thrpt, ops/s) of `java.lang.*MathBench.sig[nN]um*` improved 40~50%, while `vm.compiler.Signum._1_signumFloatTest` and `vm.compiler.Signum._3_signumDoubleTest` results gained exponential increases. Also passed GHA sanity checks, and Jtreg tier1 on Ampere-1A as function-wise smoke tests. This pull request has now been integrated. Changeset: f529bf71 Author: Patrick Zhang Committer: Andrew Haley URL: https://git.openjdk.org/jdk/commit/f529bf712d8946584999dfc98abea60c22c97167 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod 8350483: AArch64: turn on signum intrinsics by default on Ampere CPUs Reviewed-by: aph ------------- PR: https://git.openjdk.org/jdk/pull/23735 From roland at openjdk.org Wed Feb 26 09:16:03 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 26 Feb 2025 09:16:03 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v4] In-Reply-To: References: Message-ID: <6R7kv7XGOWIBrjPQCemB6u2vd_tFl_xMQGQaVWoxkK0=.d26f6780-82f8-4ab9-a4bc-ff7831ed9a1a@github.com> On Tue, 25 Feb 2025 09:27:13 GMT, Emanuel Peter wrote: >> Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below. >> >> **Background** >> >> With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer. >> >> **Problem** >> >> So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code. >> >> >> MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1); >> MemorySegment nativeUnaligned = nativeAligned.asSlice(1); >> test3(nativeUnaligned); >> >> >> When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not! >> >> static void test3(MemorySegment ms) { >> for (int i = 0; i < RANGE; i++) { >> long adr = i * 4L; >> int v = ms.get(ELEMENT_LAYOUT, adr); >> ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1)); >> } >> } >> >> >> **Solution: Runtime Checks - Predicate and Multiversioning** >> >> Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check. >> >> I came up with 2 options where to place the runtime checks: >> - A new "auto vectorization" Parse Predicate: >> - This only works when predicates are available. >> - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop. >> - Multiversion the loop: >> - Create 2 copies of the loop (fast and slow loops). >> - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take >> - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even ... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 66 commits: > > - Merge branch 'master' into JDK-8323582-SW-native-alignment > - stall -> delay, plus some more comments > - adjust selector if probability > - Merge branch 'master' into JDK-8323582-SW-native-alignment > - remove multiversion mark if we break the structure > - register opaque with igvn > - copyright and rm CFG check > - IR rules for all cases > - 3 test versions > - test changed to unaligned ints > - ... and 56 more: https://git.openjdk.org/jdk/compare/d551daca...8eb52292 Would it be possible and make sense to remove useless slow path loops the way it's done for predicates or zero trip guards? In `PhaseIdealLoop::build_loop_late_post_work()`, collect all `OpaqueMultiversioningNode` in a list. Then iterate over all loops the way it's done in `PhaseIdealLoop::eliminate_useless_zero_trip_guard()`, find loops marked as multi version, check we can get from the loop to the `OpaqueMultiversioningNode` and mark that one as useful. Eliminate all `OpaqueMultiversioningNode` not marked as useful. That way if some transformation such as peeling makes the loop non multi version or if the expected shape breaks for some reason, the slow loop is eliminated on next loop opts pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2684365921 From rkennke at openjdk.org Wed Feb 26 09:42:59 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 26 Feb 2025 09:42:59 GMT Subject: RFR: 8330174: Protection zone for easier detection of accidental zero-nKlass use [v8] In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 07:08:35 GMT, Thomas Stuefe wrote: >> If we wrongly decode an nKlass of `0`, and the nKlass encoding base is not NULL (typical for most cases that run with CDS enabled), the resulting pointer points to the start of the Klass encoding range. That area is readable. If CDS is enabled, it will be at the start of the CDS metadata archive. If CDS is off, it is at the start of the class space. >> >> Now, both CDS and class space allocate a safety buffer at the start to prevent Klass structures from being located there. However, that memory is still readable, so we can read garbage data from that area. In the case of CDS, that area is just 16 bytes, after that come real data. Since Klass is large, most accesses will read beyond the 16-byte zone. >> >> We should protect the first page in the narrow Klass encoding range to make analysis of errors like this easier. Especially in release builds where decode_not_null does not assert. We already use a similar technique in the heap, and most OSes protect the zero page for the same reason. >> >> This patch does that. Now, decoding an `0` nKlass and then using the result `Klass` - calling virtual functions or accessing members - crashes right away. >> >> Additionally, the patch provides a helpful output in the register/stack section, e.g: >> >> >> RDI=0x0000000800000000 points into nKlass protection zone >> >> >> >> Testing: >> - GHAs. >> - I tested the patch manually on x64 Linux for both CDS on, CDS off and zero-based encoding, CDS off and non-zero-based encoding. >> - I tested manually on Windows x64 >> - I also prepared an automatic gtest, but that needs some preparatory work on the gtest suite first to work (see https://bugs.openjdk.org/browse/JDK-8348029) >> >> -- Update 2024-01-22 -- >> I added a jtreg test that is more thorough than a gtest (also scans the produced hs-err file) > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > remove test coding Looks good to me! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23190#pullrequestreview-2643803532 From stuefe at openjdk.org Wed Feb 26 09:55:11 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 26 Feb 2025 09:55:11 GMT Subject: Integrated: 8330174: Protection zone for easier detection of accidental zero-nKlass use In-Reply-To: References: Message-ID: <2Jq6sSg4_w7DvyQf5xKQS1okFmepD7_9u9vhQ7bIN8U=.57fcd64f-bd5b-465e-9ffe-047bea2ab013@github.com> On Sat, 18 Jan 2025 11:20:00 GMT, Thomas Stuefe wrote: > If we wrongly decode an nKlass of `0`, and the nKlass encoding base is not NULL (typical for most cases that run with CDS enabled), the resulting pointer points to the start of the Klass encoding range. That area is readable. If CDS is enabled, it will be at the start of the CDS metadata archive. If CDS is off, it is at the start of the class space. > > Now, both CDS and class space allocate a safety buffer at the start to prevent Klass structures from being located there. However, that memory is still readable, so we can read garbage data from that area. In the case of CDS, that area is just 16 bytes, after that come real data. Since Klass is large, most accesses will read beyond the 16-byte zone. > > We should protect the first page in the narrow Klass encoding range to make analysis of errors like this easier. Especially in release builds where decode_not_null does not assert. We already use a similar technique in the heap, and most OSes protect the zero page for the same reason. > > This patch does that. Now, decoding an `0` nKlass and then using the result `Klass` - calling virtual functions or accessing members - crashes right away. > > Additionally, the patch provides a helpful output in the register/stack section, e.g: > > > RDI=0x0000000800000000 points into nKlass protection zone > > > > Testing: > - GHAs. > - I tested the patch manually on x64 Linux for both CDS on, CDS off and zero-based encoding, CDS off and non-zero-based encoding. > - I tested manually on Windows x64 > - I also prepared an automatic gtest, but that needs some preparatory work on the gtest suite first to work (see https://bugs.openjdk.org/browse/JDK-8348029) > > -- Update 2024-01-22 -- > I added a jtreg test that is more thorough than a gtest (also scans the produced hs-err file) This pull request has now been integrated. Changeset: a70eba8e Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/a70eba8e4212c2c7125475f69b3952197e7a8ce3 Stats: 426 lines in 16 files changed: 331 ins; 29 del; 66 mod 8330174: Protection zone for easier detection of accidental zero-nKlass use Co-authored-by: Ioi Lam Reviewed-by: iklam, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/23190 From stuefe at openjdk.org Wed Feb 26 09:55:10 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 26 Feb 2025 09:55:10 GMT Subject: RFR: 8330174: Protection zone for easier detection of accidental zero-nKlass use [v8] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 09:39:51 GMT, Roman Kennke wrote: > Looks good to me! Thank you @rkennke and @iklam ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23190#issuecomment-2684459656 From epeter at openjdk.org Wed Feb 26 10:02:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Feb 2025 10:02:09 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v4] In-Reply-To: <6R7kv7XGOWIBrjPQCemB6u2vd_tFl_xMQGQaVWoxkK0=.d26f6780-82f8-4ab9-a4bc-ff7831ed9a1a@github.com> References: <6R7kv7XGOWIBrjPQCemB6u2vd_tFl_xMQGQaVWoxkK0=.d26f6780-82f8-4ab9-a4bc-ff7831ed9a1a@github.com> Message-ID: On Wed, 26 Feb 2025 09:12:46 GMT, Roland Westrelin wrote: > Would it be possible and make sense to remove useless slow path loops the way it's done for predicates or zero trip guards? In `PhaseIdealLoop::build_loop_late_post_work()`, collect all `OpaqueMultiversioningNode` in a list. Then iterate over all loops the way it's done in `PhaseIdealLoop::eliminate_useless_zero_trip_guard()`, find loops marked as multi version, check we can get from the loop to the `OpaqueMultiversioningNode` and mark that one as useful. Eliminate all `OpaqueMultiversioningNode` not marked as useful. That way if some transformation such as peeling makes the loop non multi version or if the expected shape breaks for some reason, the slow loop is eliminated on next loop opts pass. I suppose we could try that. Is it ok to do that in a separate RFE, so we are keeping this here to a more manageable size? And would we not have similar issues with traversing from the loops to their `OpaqueMultiversioningNode`? What if some are not reachable in the meantime? Then we would just lose the `multiversion_if` early, and could not use it any more. So maybe we'd have to do that after the verification: [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637): C2: verify that main_loop finds pre_loop and that multiversion loops find the multiversion_if I wonder if we do not have similar issues with `PhaseIdealLoop::eliminate_useless_zero_trip_guard()` currently. Maybe it's rare enough we don't notice. @rwestrel What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2684482233 From roland at openjdk.org Wed Feb 26 10:18:12 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 26 Feb 2025 10:18:12 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v4] In-Reply-To: References: <6R7kv7XGOWIBrjPQCemB6u2vd_tFl_xMQGQaVWoxkK0=.d26f6780-82f8-4ab9-a4bc-ff7831ed9a1a@github.com> Message-ID: On Wed, 26 Feb 2025 09:59:36 GMT, Emanuel Peter wrote: > I suppose we could try that. Is it ok to do that in a separate RFE, so we are keeping this here to a more manageable size? Ok > And would we not have similar issues with traversing from the loops to their `OpaqueMultiversioningNode`? What if some are not reachable in the meantime? Then we would just lose the `multiversion_if` early, and could not use it any more. So maybe we'd have to do that after the verification: [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637): C2: verify that main_loop finds pre_loop and that multiversion loops find the multiversion_if > > I wonder if we do not have similar issues with `PhaseIdealLoop::eliminate_useless_zero_trip_guard()` currently. Maybe it's rare enough we don't notice. I don't think that's a problem. When that code runs the graph is in a stable shape. There's no dead condition that needs to go through igvn to be cleaned up. We've just run igvn and haven't made any change to the graph yet. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2684523673 From epeter at openjdk.org Wed Feb 26 10:30:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Feb 2025 10:30:15 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v4] In-Reply-To: References: <6R7kv7XGOWIBrjPQCemB6u2vd_tFl_xMQGQaVWoxkK0=.d26f6780-82f8-4ab9-a4bc-ff7831ed9a1a@github.com> Message-ID: On Wed, 26 Feb 2025 10:15:48 GMT, Roland Westrelin wrote: > > And would we not have similar issues with traversing from the loops to their `OpaqueMultiversioningNode`? What if some are not reachable in the meantime? Then we would just lose the `multiversion_if` early, and could not use it any more. So maybe we'd have to do that after the verification: [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637): C2: verify that main_loop finds pre_loop and that multiversion loops find the multiversion_if > > I wonder if we do not have similar issues with `PhaseIdealLoop::eliminate_useless_zero_trip_guard()` currently. Maybe it's rare enough we don't notice. > > I don't think that's a problem. When that code runs the graph is in a stable shape. There's no dead condition that needs to go through igvn to be cleaned up. We've just run igvn and haven't made any change to the graph yet. Ah ok, I'll have to look into it myself then. But if we know that it happens at the beginning of a loop-opts phase just after igvn, and no predicates were hacked yet, then that should work fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2684550571 From aph at openjdk.org Wed Feb 26 10:27:02 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 26 Feb 2025 10:27:02 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations [v2] In-Reply-To: References: <8QDbenZGakijqUrwAcaVogoJBEiNpzYhN3sDrrteSDk=.d8539631-ab03-45ff-a762-0b6e14c63f89@github.com> Message-ID: On Wed, 26 Feb 2025 08:49:58 GMT, Bhavana Kilambi wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2097: >> >>> 2095: >>> 2096: // Half-precision floating-point instructions >>> 2097: INSN(fabdh, 0b011, 0b11, 0b000101, 0b0); >> >> I suppose `fadbh` and `fnmulh` are added to keep aligned with the float and double ones, i.e. `fabd(s|d)` and `fnmul(s|d)`. >> >> >> I noticed that there are matching rules for `fabd(s|d)`, i.e. `absd(F|D)_reg`. I wonder if we need add the corresponding rule for fp16 here? > > Hi @shqking , thanks for your review comments. Yes I added `fabdh` and `fnmulh` to keep aligned with float and double types. > For adding support for FP16 `absd` we need `AbsHF` to be supported (along with SubHF) but `AbsHF` node is not implemented currently. `abs` operation is directly executed from the java code here - https://github.com/openjdk/jdk/blob/037e47112bdf2fa2324f7c58198f6d433f17d9fd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java#L1464 and is not intrinsified or pattern matched like other FP16 operations. Same with `negate` operation for FP16 - https://github.com/openjdk/jdk/blob/037e47112bdf2fa2324f7c58198f6d433f17d9fd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java#L1449 > On the Valhalla repo, while these operation were being developed, I tried adding support for `AbsHF/NegHF` which emitted `fabs` and `fneg` instructions but the performance with the direct java code(bit manipulation operations) was much faster (sorry don't remember the exact number) so we decided to go with the java implementation instead. > I still added `fabd` here because `op21` is 0 only in `fabd` H variant and felt that it'd be better to handle it here as it belongs to this group of instructions. Please let me know your thoughts. According to the RM, fabd is in _Advanced SIMD scalar three same FP16_, but the rest are in _Floating-point data-processing (2 source)_. The decoding scheme looks rather different.`fabd`, then, doesn't really fit here, but in a section with the rest of the three same FP16 instructions. The encoding scheme for _Advanced SIMD scalar three same FP16_ is pretty simple, so I suggest you create a new group for them, and put `fabd` in there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23748#discussion_r1971330062 From epeter at openjdk.org Wed Feb 26 10:36:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 26 Feb 2025 10:36:06 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v4] In-Reply-To: References: <6R7kv7XGOWIBrjPQCemB6u2vd_tFl_xMQGQaVWoxkK0=.d26f6780-82f8-4ab9-a4bc-ff7831ed9a1a@github.com> Message-ID: On Wed, 26 Feb 2025 10:15:48 GMT, Roland Westrelin wrote: >>> Would it be possible and make sense to remove useless slow path loops the way it's done for predicates or zero trip guards? In `PhaseIdealLoop::build_loop_late_post_work()`, collect all `OpaqueMultiversioningNode` in a list. Then iterate over all loops the way it's done in `PhaseIdealLoop::eliminate_useless_zero_trip_guard()`, find loops marked as multi version, check we can get from the loop to the `OpaqueMultiversioningNode` and mark that one as useful. Eliminate all `OpaqueMultiversioningNode` not marked as useful. That way if some transformation such as peeling makes the loop non multi version or if the expected shape breaks for some reason, the slow loop is eliminated on next loop opts pass. >> >> I suppose we could try that. Is it ok to do that in a separate RFE, so we are keeping this here to a more manageable size? >> >> I don't see it as super critical personally, as the slow_path is `delayed`, so no loop-opts are performed on it. The overhead is minimal if we keep it until after loop-opts, I think. But I'm not against trying. It would take a bit of effort to construct test cases where we have the loop fold away after multiversion_if is added, but that is probably possible. >> >> And would we not have similar issues with traversing from the loops to their `OpaqueMultiversioningNode`? What if some are not reachable in the meantime? Then we would just lose the `multiversion_if` early, and could not use it any more. So maybe we'd have to do that after the verification: >> [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637): C2: verify that main_loop finds pre_loop and that multiversion loops find the multiversion_if >> >> I wonder if we do not have similar issues with `PhaseIdealLoop::eliminate_useless_zero_trip_guard()` currently. Maybe it's rare enough we don't notice. >> >> @rwestrel What do you think? > >> I suppose we could try that. Is it ok to do that in a separate RFE, so we are keeping this here to a more manageable size? > > Ok > >> And would we not have similar issues with traversing from the loops to their `OpaqueMultiversioningNode`? What if some are not reachable in the meantime? Then we would just lose the `multiversion_if` early, and could not use it any more. So maybe we'd have to do that after the verification: [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637): C2: verify that main_loop finds pre_loop and that multiversion loops find the multiversion_if >> >> I wonder if we do not have similar issues with `PhaseIdealLoop::eliminate_useless_zero_trip_guard()` currently. Maybe it's rare enough we don't notice. > > I don't think that's a problem. When that code runs the graph is in a stable shape. There's no dead condition that needs to go through igvn to be cleaned up. We've just run igvn and haven't made any change to the graph yet. @rwestrel I filed this follow-up RFE: [JDK-8350756](https://bugs.openjdk.org/browse/JDK-8350756): C2 SuperWord Multiversioning: remove useless slow loop when the fast loop disappears We'll have to be careful to only fold the `slow_loop` away if it is not used, i.e. if we did not in the meantime use the `multiversion_if`, and maybe the `fast_loop` structure is only desintegrating because of some speculative assumption, maybe because of more unrolling that only happens with vectorization. It would be good to have a test-case for that. I'm writing that here so I will remember it later ;) @rwestrel Do you have any other ideas / suggestions? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2684567780 From jsjolen at openjdk.org Wed Feb 26 10:54:08 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 26 Feb 2025 10:54:08 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v31] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: <1BsBLyqT4rLpUQTF_ganIYfb8ZyfT5DAf0eJcx8XJes=.e700c4ed-9a34-462f-a143-b352056781d3@github.com> On Tue, 25 Feb 2025 15:40:54 GMT, Gerard Ziemski wrote: > How would I go about verifying the performance gain? You mentioned previously that you wrote a microbenchmark for testing this? Hi, The performance gain Afshin mentions is from comparing SLL and Treap, so it's pretty clear from the get-go that Treap will be faster. Those micros are deleted from the source now, you can find them most easily by going through Afshin's last commits and finding where they were deleted. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2684608778 From galder at openjdk.org Wed Feb 26 11:36:11 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 26 Feb 2025 11:36:11 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 7 Feb 2025 12:39:24 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 44 additional commits since the last revision: > > - Merge branch 'master' into topic.intrinsify-max-min-long > - Fix typo > - Renaming methods and variables and add docu on algorithms > - Fix copyright years > - Make sure it runs with cpus with either avx512 or asimd > - Test can only run with 256 bit registers or bigger > > * Remove platform dependant check > and use platform independent configuration instead. > - Fix license header > - Tests should also run on aarch64 asimd=true envs > - Added comment around the assertions > - Adjust min/max identity IR test expectations after changes > - ... and 34 more: https://git.openjdk.org/jdk/compare/abdd4f5e...a190ae68 > > Re: [#20098 (comment)](https://github.com/openjdk/jdk/pull/20098#issuecomment-2671144644) - I was trying to think what could be causing this. > > Maybe it is an issue with probabilities? Do you know at what point (if at all) the `MinI` node appears/disappears in that example? The probabilities are fine. I think the issue with `Math.min(II)` seems to be specific to when its compilation happens, and the combined fact that the intrinsic has been disabled and vectorization does not kick in (explicitly disabled). Note that other parts of the JDK invoke `Math.min(II)`. In the slow cases it appears the compilation happens before the benchmark kicks in, and so it takes the profiling data before the benchmark to decide how to compile this in. In the slow versions you see this `PrintMethodData`: static java.lang.Math::min(II)I interpreter_invocation_count: 18171 invocation_counter: 18171 backedge_counter: 0 decompile_count: 0 mdo size: 328 bytes 0 iload_0 1 iload_1 2 if_icmpgt 9 0 bci: 2 BranchData taken(7732) displacement(56) not taken(10180) 5 iload_0 6 goto 10 32 bci: 6 JumpData taken(10180) displacement(24) 9 iload_1 10 ireturn org.openjdk.bench.java.lang.MinMaxVector::intReductionSimpleMin(Lorg/openjdk/bench/java/lang/MinMaxVector$LoopState;)I interpreter_invocation_count: 189 invocation_counter: 189 backedge_counter: 313344 decompile_count: 0 mdo size: 384 bytes 0 iconst_0 1 istore_2 2 iconst_0 3 istore_3 4 iload_3 5 aload_1 6 fast_igetfield 35 9 if_icmpge 33 0 bci: 9 BranchData taken(58) displacement(72) not taken(192512) 12 aload_1 13 fast_agetfield 41 16 iload_3 17 iaload 18 istore #4 20 iload_2 21 fast_iload #4 23 invokestatic 32 32 bci: 23 CounterData count(192512) 26 istore_2 27 iinc #3 1 30 goto 4 48 bci: 30 JumpData taken(192512) displacement(-48) 33 iload_2 34 ireturn The benchmark method calls Math.min `192_512` times, yet the method data shows only `18_171` invocations, of which `7_732` are taken which is 42%. So it gets compiled with a `cmov` and the benchmark will be slow because it will branch 100% one of the sides. In the fast version, `PrintMethodData` looks like this: static java.lang.Math::min(II)I interpreter_invocation_count: 1575322 invocation_counter: 1575322 backedge_counter: 0 decompile_count: 0 mdo size: 368 bytes 0 iload_0 1 iload_1 2 if_icmpgt 9 0 bci: 2 BranchData taken(1418001) displacement(56) not taken(157062) 5 iload_0 6 goto 10 32 bci: 6 JumpData taken(157062) displacement(24) 9 iload_1 10 ireturn org.openjdk.bench.java.lang.MinMaxVector::intReductionSimpleMin(Lorg/openjdk/bench/java/lang/MinMaxVector$LoopState;)I interpreter_invocation_count: 858 invocation_counter: 858 backedge_counter: 1756214 decompile_count: 0 mdo size: 424 bytes 0 iconst_0 1 istore_2 2 iconst_0 3 istore_3 4 iload_3 5 aload_1 6 fast_igetfield 35 9 if_icmpge 33 0 bci: 9 BranchData taken(733) displacement(72) not taken(1637363) 12 aload_1 13 fast_agetfield 41 16 iload_3 17 iaload 18 istore #4 20 iload_2 21 fast_iload #4 23 invokestatic 32 32 bci: 23 CounterData count(1637363) 26 istore_2 27 iinc #3 1 30 goto 4 48 bci: 30 JumpData taken(1637363) displacement(-48) 33 iload_2 34 ireturn The benchmark method calls Math.min `1_637_363` times, and the method data shows `1_575_322` invocations, of which `1_418_001` are taken which is 90%. So no cmov is introduced and the benchmark will be fast because it will branch 100% one of the sides. A factor here might be my Xeon machine. I run the benchmark on a 4 core VM inside it, so given the limited resources compilation can take longer. I've noticed that it's easier to replicate this scenario there rather than my M1 laptop, which has 10 cores. >> So, if those int scalar regressions were not a problem when int min/max intrinsic was added, I would expect the same to apply to long. > > Do you know when they were added? If that was a long time ago, we might not have noticed back then, but we might notice now. I don't know when they were added. > That said: if we know that it is only in the high-probability cases, then we can address those separately. I would not consider it a blocking issue, as long as we file the follow-up RFE for int/max scalar case with high branch probability. > > What would be really helpful: a list of all regressions / issues, and how we intend to deal with them. If we later find a regression that someone cares about, then we can come back to that list, and justify the decision we made here. I'll make up a list of regressions and post it here. I won't create RFEs for now. I'd rather wait until we have the list in front of us and we can decide which RFEs to create. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2684701935 From coleenp at openjdk.org Wed Feb 26 11:49:59 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 26 Feb 2025 11:49:59 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 [v5] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 07:35:30 GMT, Aleksey Shipilev wrote: >> See bug for description of the bug. Shenandoah seems to be the only GC that runs into this problem so far. >> >> Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. >> >> I think the solution is to keep storing a cached modifiers field in `Klass` instead of relying on Java mirror being accessible. Unfortunately, this patch undoes the removal of `u2` field from `Klass` done in [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567). >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, original reproducer now passes >> - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Drop "is" Marked as reviewed by coleenp (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23775#pullrequestreview-2644191080 From coleenp at openjdk.org Wed Feb 26 11:52:09 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 26 Feb 2025 11:52:09 GMT Subject: RFR: 8328473: StringTable and SymbolTable statistics delay time to safepoint [v2] In-Reply-To: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> References: <9398Xb9iafu__4qT9uirLVZVxWpUTL_bHdjfsRZRzWI=.49e94590-0052-4c5e-9f5f-68350f2ba648@github.com> Message-ID: On Mon, 24 Feb 2025 18:41:28 GMT, Coleen Phillimore wrote: >> This change adds a safepoint poll to gathering statistics for the Symbol and String tables, using the ConcurrentHashTableTasks to chunk up the walk. The stringTable and symbolTable is similar, like the GrowTask and DeleteTask code. Maybe this can be cleaned up but I don't have a good idea about that yet that doesn't involve yet another level of templated functions and code. This is already pretty highly templatized. >> Tested with tier1-4 and runThese internal test with JFR and failure injection to verify that we do try to safepoint while gathering statistics. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fxi typo. Thank you for reviewing this Erik and Aleksey. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23750#issuecomment-2684734334 From coleenp at openjdk.org Wed Feb 26 11:52:10 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 26 Feb 2025 11:52:10 GMT Subject: Integrated: 8328473: StringTable and SymbolTable statistics delay time to safepoint In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 14:27:01 GMT, Coleen Phillimore wrote: > This change adds a safepoint poll to gathering statistics for the Symbol and String tables, using the ConcurrentHashTableTasks to chunk up the walk. The stringTable and symbolTable is similar, like the GrowTask and DeleteTask code. Maybe this can be cleaned up but I don't have a good idea about that yet that doesn't involve yet another level of templated functions and code. This is already pretty highly templatized. > Tested with tier1-4 and runThese internal test with JFR and failure injection to verify that we do try to safepoint while gathering statistics. This pull request has now been integrated. Changeset: 1e18fffe Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/1e18fffee456382c4eeb017b3fad0dc99ccaad35 Stats: 163 lines in 5 files changed: 94 ins; 41 del; 28 mod 8328473: StringTable and SymbolTable statistics delay time to safepoint Reviewed-by: shade, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/23750 From dholmes at openjdk.org Wed Feb 26 12:06:00 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 26 Feb 2025 12:06:00 GMT Subject: RFR: 8330174: Protection zone for easier detection of accidental zero-nKlass use [v8] In-Reply-To: References: Message-ID: <4DyLQoIuV_FOGZeLHwN4mTyO-ablmfJB-j60B-CPcdI=.190dca9b-04ae-448c-8a32-5ca355b95cb8@github.com> On Wed, 26 Feb 2025 09:51:06 GMT, Thomas Stuefe wrote: >> Looks good to me! > >> Looks good to me! > > Thank you @rkennke and @iklam ! @tstuefe your new test is failing on all platforms in our CI. ----------System.err:(23/1185)---------- stdout: [[0.001s][info][metaspace] - commit_granule_bytes: 65536. [0.001s][info][metaspace] - commit_granule_words: 8192. [0.001s][info][metaspace] - virtual_space_node_default_size: 8388608. [0.001s][info][metaspace] - enlarge_chunks_in_place: 1. [0.054s][error][cds ] An error has occurred while processing the shared archive file. [0.054s][error][cds ] Unable to map shared spaces Error occurred during initialization of VM Unable to use shared archive. ]; stderr: [] exitValue = 1 java.lang.RuntimeException: did not find Narrow klass base in log output Sorry not in a position to file a bug at the moment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23190#issuecomment-2684765761 From sroy at openjdk.org Wed Feb 26 12:21:39 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Wed, 26 Feb 2025 12:21:39 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v27] In-Reply-To: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: > JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) > > Currently acceleration code for GHASH is missing for PPC64. > > The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: - change pattern for Linux, fix for AIX - change pattern for Linux, fix for AIX ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20235/files - new: https://git.openjdk.org/jdk/pull/20235/files/467af71c..474b891b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=25-26 Stats: 14 lines in 2 files changed: 8 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20235/head:pull/20235 PR: https://git.openjdk.org/jdk/pull/20235 From stuefe at openjdk.org Wed Feb 26 12:25:11 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 26 Feb 2025 12:25:11 GMT Subject: RFR: 8330174: Protection zone for easier detection of accidental zero-nKlass use [v8] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 09:51:06 GMT, Thomas Stuefe wrote: >> Looks good to me! > >> Looks good to me! > > Thank you @rkennke and @iklam ! > @tstuefe your new test is failing on all platforms in our CI. > > ``` > ----------System.err:(23/1185)---------- > stdout: [[0.001s][info][metaspace] - commit_granule_bytes: 65536. > [0.001s][info][metaspace] - commit_granule_words: 8192. > [0.001s][info][metaspace] - virtual_space_node_default_size: 8388608. > [0.001s][info][metaspace] - enlarge_chunks_in_place: 1. > [0.054s][error][cds ] An error has occurred while processing the shared archive file. > [0.054s][error][cds ] Unable to map shared spaces > Error occurred during initialization of VM > Unable to use shared archive. > ]; > stderr: [] > exitValue = 1 > > java.lang.RuntimeException: did not find Narrow klass base in log output > ``` > > Sorry not in a position to file a bug at the moment. okay, I prepared a backout; waiting for GHAs to finish : https://github.com/openjdk/jdk/pull/23799 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23190#issuecomment-2684807739 From sroy at openjdk.org Wed Feb 26 12:26:02 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Wed, 26 Feb 2025 12:26:02 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v26] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> <7qzgn1LeDY8CaNJZVRPb0FORbKbkfBP85qrU3MSH_Io=.62e89bae-a5e2-47e3-8217-d83ab7bef00f@github.com> Message-ID: On Tue, 25 Feb 2025 16:49:03 GMT, Martin Doerr wrote: >> @TheRealMDoerr >> I understood the failure on AIX. It is related to this. >> >> vec_perm(vH, vTmp5, vTmp4, vPerm)- Here we combine first and last 16 bytes and extract 16 bytes out of them using the pattern generated by lvsl in vPerm. >> >> We required the 2 extra vec_perm,specifically, for Linux on Power , so that order of elements is retained, else we will end up selecting the wrong 16bytes . >> >> For Linux we need vec_perm(vH, vTmp5, vTmp4, vPerm); ...for AIX it would be vec_perm(vH, vTmp4, vTmp5, vPerm); without the need for the 2 vec_perm statements, as the order is retained due to Endianness. >> >> I am trying to find a pattern that can eliminate the need to do 2 extra vec_perm for Linux on Power. >> >> One thing I tried was >> __ xxspltib(vTmp12->to_vsr(), 31); >> __ vxor(vPerm, vPerm, vTmp12); >> This generates the sequence of bytes ,required for Little Endian. >> Some test cases did pass, but some failed too. Still working on it. Let me know your inputs too. >> >> If the above explanation is not clear, let me know, I will try to explain with an example > > I'll wait for the AIX fix and make experiments on both platforms after that. @TheRealMDoerr I was able to fix this and find a the pattern to eliminate need for 2 vec_perm instructions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1971493452 From jsjolen at openjdk.org Wed Feb 26 12:34:22 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 26 Feb 2025 12:34:22 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v31] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Tue, 25 Feb 2025 11:06:26 GMT, Afshin Zafari wrote: >> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. >> - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. >> - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. >> - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > reviews applied. test/hotspot/gtest/nmt/test_vmatree.cpp line 221: > 219: EXPECT_EQ(-50, diff.tag[NMTUtil::tag_to_index(mtTest)].commit); > 220: } > 221: What's going on with this test? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1971504677 From fgao at openjdk.org Wed Feb 26 12:36:12 2025 From: fgao at openjdk.org (Fei Gao) Date: Wed, 26 Feb 2025 12:36:12 GMT Subject: RFR: 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands [v3] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 11:26:33 GMT, Fei Gao wrote: >> `IndOffXX` types don't do us any good. It would be simpler and faster to match a general-purpose `IndOff` type then let `legitimize_address()` fix any out-of-range operands. That'd reduce the size of the match rules and the time to run them. >> >> This patch simplifies the definitions of `immXOffset` with an estimated range. Whether an immediate can be encoded in a `LDR`/`STR` instructions as an offset will be determined in the phase of code-emitting. Meanwhile, we add necessary `legitimize_address()` in the phase of matcher for all `LDR`/`STR` instructions using the new `IndOff` memory operands (fix [JDK-8341437](https://bugs.openjdk.org/browse/JDK-8341437)). >> >> After this clean-up, memory operands matched with `IndOff` may require extra code emission (effectively a `lea`) before the address can be used. So we also modify the code about looking up precise offset of load/store instruction for implicit null check (fix [JDK-8340646](https://bugs.openjdk.org/browse/JDK-8340646)). On `aarch64` platform, we will use the beginning offset of the last instruction in the instruction clause emitted for a load/store machine node. Because `LDR`/`STR` is always the last one emitted, no matter what addressing mode the load/store operations finally use. >> >> Tier 1 - 3 passed on `Macos-aarch64` with or without the vm option `-XX:+UseZGC`. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into cleanup_indoff > - Update the copyright year and code comments > - Merge branch 'master' into cleanup_indoff > - 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands > > IndOffXX types don't do us any good. It would be simpler and > faster to match a general-purpose IndOff type then let > legitimize_address() fix any out-of-range operands. That'd > reduce the size of the match rules and the time to run them. > > This patch simplifies the definitions of `immXOffset` with an > estimated range. Whether an immediate can be encoded in a > LDR/STR instructions as an offset will be determined in the phase > of code-emitting. Meanwhile, we add necessary > `legitimize_address()` in the phase of matcher for all LDR/STR > instructions using the new `IndOff` memory operands > (fix JDK-8341437). > > After this clean-up, memory operands matched with `IndOff` may > require extra code emission (effectively a lea) before the address > can be used. So we also modify the code about looking up precise > offset of load/store instruction for implicit null check > (fix JDK-8340646). On aarch64 platform, we will use the beginning > offset of the last instruction in the instruction clause emitted > for a load/store machine node. Because LDR/STR is always the last > one emitted, no matter what addressing mode the load/store > operations finally use. > > Tier 1 - 3 passed on Macos-aarch64 with or without the vm option > "-XX:+UseZGC" [These cases](https://github.com/fg1417/jdk/actions/runs/13520045169/attempts/1#summary-37778338908) failed. Because, when setting the offset for `implicit null check`, it's assumed that all instruction clauses emitted by memory related machnodes are ended up with `ldr/str`. But the machine node [loadNKlassCompactHeaders](https://github.com/openjdk/jdk/commit/ff12ff534abb2e08d1bb44a83ef4f84b8476f94c#diff-018aa61d1a7aafcf70a535fcd40a318a4bd6511fd40ac39ce4be90cc52216749R6694) generated by C2 doesn't follow this assumption, which will emit `lsl` as the last instruction. I plan to handle this special node in the platform-specific interface [`int Matcher::offset_for_null_check()`](https://github.com/openjdk/jdk/pull/22862/files#diff-018aa61d1a7aafcf70a535fcd40a318a4bd6511fd40ac39ce4be90cc52216749R2619). I checked all memory related machnodes in `aarch64.ad`. For some machnodes, like [popCountI_mem](https://github.com/openjdk/jdk/blob/75420e9314c54adc5b45f9b274a87af54dd6b5a8/src/hotspot/c pu/aarch64/aarch64.ad#L7720) and [compressBitsI_memcon](https://github.com/openjdk/jdk/blob/75420e9314c54adc5b45f9b274a87af54dd6b5a8/src/hotspot/cpu/aarch64/aarch64.ad#L16971), I haven't found any determinative evidence to show that they will never be picked by `implicit null check`. Maybe we also need to fix them. What do you think? @theRealAph Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22862#issuecomment-2684830772 From rcastanedalo at openjdk.org Wed Feb 26 12:36:10 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 26 Feb 2025 12:36:10 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v6] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 16:43:21 GMT, Thomas Stuefe wrote: >> Greetings, >> >> This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. >> >> Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. >> >> I wanted to track that information correctly and display it clearly in a way that is easy to understand. >> >> The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). >> >> The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. >> >> The statistic gives us two new forms of output: >> >> 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: >> >> >> Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: >> Phase Total ra node comp type index reglive regsplit cienv other >> none 1205512 155104 982984 33712 0 0 0 0 0 33712 >> parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 >> optimizer 916584 0 556416 0 0 0 0 0 0 360168 >> escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 >> connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 >> macroEliminate 196448 0 196448 0 0 0 0 0 0 0 >> iterGVN 327440 0 196368 131072 0 0 0 0 0 0 >> incrementalInline 3992816 0 3043704 62... > > Thomas Stuefe has updated the pull request incrementally with five additional commits since the last revision: > > - feedback ashu > - feedback roberto > - final-statistics-switch > - performance fix > - remove test code src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 306: > 304: if (_comp_type == compiler_c2) { > 305: // Update C2 node count > 306: // Careful, Compile::current() may be NULL in a short time window when Compile itself The recently added `sources/TestNoNULL.java` test fails due to this occurrence of `NULL`. Suggestion: // Careful, Compile::current() may be null in a short time window when Compile itself ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1971506973 From rcastanedalo at openjdk.org Wed Feb 26 13:03:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 26 Feb 2025 13:03:55 GMT Subject: RFR: 8344009: Improve compiler memory statistics In-Reply-To: References: <0wHGNSlwe7cWb7Plad2n8Swy8rayYTAf5IETuw9zl4U=.a4d6a129-aebc-4639-aaef-92ee6c4552c7@github.com> Message-ID: On Mon, 24 Feb 2025 08:56:51 GMT, Roberto Casta?eda Lozano wrote: >>> @robcasloz I identified and hopefully fixed a small issue that hit the "disabled" path. Turns out we allocate arena chunks a lot more frequently than I thought, and the new unconditional call to Thread::current() in there was hurting a bit. I now avoid this unless I know the statistic is enabled. >>> >>> With this patch, on my machine the difference between unpatched and patched JVM with stats disabled is below one standard deviation for the benchmark in question. >> >> Great, thanks! Will re-run benchmarking and report results early next week. > >> > @robcasloz I identified and hopefully fixed a small issue that hit the "disabled" path. Turns out we allocate arena chunks a lot more frequently than I thought, and the new unconditional call to Thread::current() in there was hurting a bit. I now avoid this unless I know the statistic is enabled. >> > With this patch, on my machine the difference between unpatched and patched JVM with stats disabled is below one standard deviation for the benchmark in question. >> >> Great, thanks! Will re-run benchmarking and report results early next week. > > Functional test results (Oracle tier1-5) still look good for the latest commit (dd7a06ad). I can confirm that the C2 speed regression on our linux-x64 machines is almost fully mitigated. The 2-3% regression on our macosx-aarch64 machines does not seem to be addressed by the latest changes though, but as I mentioned before I think it is in the acceptable range (and only affects one benchmark). > @robcasloz, @ashu-mehra thanks a lot for your reviews. I incorporated most of them into the PR. Thanks, Thomas! I see that the changes suggested in https://github.com/openjdk/jdk/commit/d501bd8a674229904358fb168a9c347004efeea3 are not incorporated, is it because you find them out of the scope of this PR? I would argue that at least tagging `Compile::_Compile_types` with `tag_type` is relevant and in line with the other changes included in this PR, e.g. [this one](https://github.com/openjdk/jdk/pull/23530/files#diff-3559dcf23b719805be5fd06fd5c1851dbd8f53e47afe6d99cba13a3de0ebc6b2R443). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2684893025 From stuefe at openjdk.org Wed Feb 26 13:50:13 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 26 Feb 2025 13:50:13 GMT Subject: RFR: 8350770: [BACKOUT] Protection zone for easier detection of accidental zero-nKlass use Message-ID: Please review this backout of JDK-8330174; it broke several tests at SAP and Oracle. See https://bugs.openjdk.org/browse/JDK-8350768. ------------- Commit messages: - Revert "8330174: Protection zone for easier detection of accidental zero-nKlass use" Changes: https://git.openjdk.org/jdk/pull/23799/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23799&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350770 Stats: 426 lines in 16 files changed: 29 ins; 331 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/23799.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23799/head:pull/23799 PR: https://git.openjdk.org/jdk/pull/23799 From mdoerr at openjdk.org Wed Feb 26 13:50:13 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Feb 2025 13:50:13 GMT Subject: RFR: 8350770: [BACKOUT] Protection zone for easier detection of accidental zero-nKlass use In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 12:21:43 GMT, Thomas Stuefe wrote: > Please review this backout of JDK-8330174; it broke several tests at SAP and Oracle. See https://bugs.openjdk.org/browse/JDK-8350768. The backout is correct. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23799#pullrequestreview-2644483340 From aph at openjdk.org Wed Feb 26 14:02:20 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 26 Feb 2025 14:02:20 GMT Subject: RFR: 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands [v3] In-Reply-To: References: Message-ID: <4mHMQuRHhyf-GqB48CMoGnnYWjZ_LRxgxr3GkthZXis=.b6ff608e-a1c7-4d5d-bced-2f8c16c46f15@github.com> On Tue, 25 Feb 2025 11:26:33 GMT, Fei Gao wrote: >> `IndOffXX` types don't do us any good. It would be simpler and faster to match a general-purpose `IndOff` type then let `legitimize_address()` fix any out-of-range operands. That'd reduce the size of the match rules and the time to run them. >> >> This patch simplifies the definitions of `immXOffset` with an estimated range. Whether an immediate can be encoded in a `LDR`/`STR` instructions as an offset will be determined in the phase of code-emitting. Meanwhile, we add necessary `legitimize_address()` in the phase of matcher for all `LDR`/`STR` instructions using the new `IndOff` memory operands (fix [JDK-8341437](https://bugs.openjdk.org/browse/JDK-8341437)). >> >> After this clean-up, memory operands matched with `IndOff` may require extra code emission (effectively a `lea`) before the address can be used. So we also modify the code about looking up precise offset of load/store instruction for implicit null check (fix [JDK-8340646](https://bugs.openjdk.org/browse/JDK-8340646)). On `aarch64` platform, we will use the beginning offset of the last instruction in the instruction clause emitted for a load/store machine node. Because `LDR`/`STR` is always the last one emitted, no matter what addressing mode the load/store operations finally use. >> >> Tier 1 - 3 passed on `Macos-aarch64` with or without the vm option `-XX:+UseZGC`. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into cleanup_indoff > - Update the copyright year and code comments > - Merge branch 'master' into cleanup_indoff > - 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands > > IndOffXX types don't do us any good. It would be simpler and > faster to match a general-purpose IndOff type then let > legitimize_address() fix any out-of-range operands. That'd > reduce the size of the match rules and the time to run them. > > This patch simplifies the definitions of `immXOffset` with an > estimated range. Whether an immediate can be encoded in a > LDR/STR instructions as an offset will be determined in the phase > of code-emitting. Meanwhile, we add necessary > `legitimize_address()` in the phase of matcher for all LDR/STR > instructions using the new `IndOff` memory operands > (fix JDK-8341437). > > After this clean-up, memory operands matched with `IndOff` may > require extra code emission (effectively a lea) before the address > can be used. So we also modify the code about looking up precise > offset of load/store instruction for implicit null check > (fix JDK-8340646). On aarch64 platform, we will use the beginning > offset of the last instruction in the instruction clause emitted > for a load/store machine node. Because LDR/STR is always the last > one emitted, no matter what addressing mode the load/store > operations finally use. > > Tier 1 - 3 passed on Macos-aarch64 with or without the vm option > "-XX:+UseZGC" > [These cases](https://github.com/fg1417/jdk/actions/runs/13520045169/attempts/1#summary-37778338908) failed. Because, when setting the offset for `implicit null check`, it's assumed that all instruction clauses emitted by memory related machnodes are ended up with `ldr/str`. > I haven't found any determinative evidence to show that they will never be picked by `implicit null check`. Maybe we also need to fix them. I just did an experiment, and I see that when `popCountI_mem` is used, it is guarded by an explicit null check, rather than using an implcit one. It seems that C2 carefully checks operands for implicit null check opportunities. However, it seems that this patch is turning out to have a much larger blast radius than I expected. I was hoping that it'd just work, without affecting anything else. I don't think that this idea really justifies making such changes to C2, sorry. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22862#issuecomment-2685110086 From azafari at openjdk.org Wed Feb 26 14:05:24 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 26 Feb 2025 14:05:24 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v31] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Wed, 26 Feb 2025 12:31:39 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> reviews applied. > > test/hotspot/gtest/nmt/test_vmatree.cpp line 221: > >> 219: EXPECT_EQ(-50, diff.tag[NMTUtil::tag_to_index(mtTest)].commit); >> 220: } >> 221: > > What's going on with this test? It checks the `use_tag_inplace` in committing a region, by passing `true` to the last parameter of the `VMATree::commit_mapping`. It is expected that the existing tag from the previous region to be copied to this new region. The mem tag of the `rd2` (`mtNone`) should not be used for the new committed region. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1971643555 From azafari at openjdk.org Wed Feb 26 14:05:25 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 26 Feb 2025 14:05:25 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v31] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Wed, 26 Feb 2025 14:01:07 GMT, Afshin Zafari wrote: >> test/hotspot/gtest/nmt/test_vmatree.cpp line 221: >> >>> 219: EXPECT_EQ(-50, diff.tag[NMTUtil::tag_to_index(mtTest)].commit); >>> 220: } >>> 221: >> >> What's going on with this test? > > It checks the `use_tag_inplace` in committing a region, by passing `true` to the last parameter of the `VMATree::commit_mapping`. It is expected that the existing tag from the previous region to be copied to this new region. The mem tag of the `rd2` (`mtNone`) should not be used for the new committed region. Also, it is skipped until the corresponding PR get integrated, otherwise it fails. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1971644873 From aph at openjdk.org Wed Feb 26 14:08:03 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 26 Feb 2025 14:08:03 GMT Subject: RFR: 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands [v3] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 11:26:33 GMT, Fei Gao wrote: >> `IndOffXX` types don't do us any good. It would be simpler and faster to match a general-purpose `IndOff` type then let `legitimize_address()` fix any out-of-range operands. That'd reduce the size of the match rules and the time to run them. >> >> This patch simplifies the definitions of `immXOffset` with an estimated range. Whether an immediate can be encoded in a `LDR`/`STR` instructions as an offset will be determined in the phase of code-emitting. Meanwhile, we add necessary `legitimize_address()` in the phase of matcher for all `LDR`/`STR` instructions using the new `IndOff` memory operands (fix [JDK-8341437](https://bugs.openjdk.org/browse/JDK-8341437)). >> >> After this clean-up, memory operands matched with `IndOff` may require extra code emission (effectively a `lea`) before the address can be used. So we also modify the code about looking up precise offset of load/store instruction for implicit null check (fix [JDK-8340646](https://bugs.openjdk.org/browse/JDK-8340646)). On `aarch64` platform, we will use the beginning offset of the last instruction in the instruction clause emitted for a load/store machine node. Because `LDR`/`STR` is always the last one emitted, no matter what addressing mode the load/store operations finally use. >> >> Tier 1 - 3 passed on `Macos-aarch64` with or without the vm option `-XX:+UseZGC`. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into cleanup_indoff > - Update the copyright year and code comments > - Merge branch 'master' into cleanup_indoff > - 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands > > IndOffXX types don't do us any good. It would be simpler and > faster to match a general-purpose IndOff type then let > legitimize_address() fix any out-of-range operands. That'd > reduce the size of the match rules and the time to run them. > > This patch simplifies the definitions of `immXOffset` with an > estimated range. Whether an immediate can be encoded in a > LDR/STR instructions as an offset will be determined in the phase > of code-emitting. Meanwhile, we add necessary > `legitimize_address()` in the phase of matcher for all LDR/STR > instructions using the new `IndOff` memory operands > (fix JDK-8341437). > > After this clean-up, memory operands matched with `IndOff` may > require extra code emission (effectively a lea) before the address > can be used. So we also modify the code about looking up precise > offset of load/store instruction for implicit null check > (fix JDK-8340646). On aarch64 platform, we will use the beginning > offset of the last instruction in the instruction clause emitted > for a load/store machine node. Because LDR/STR is always the last > one emitted, no matter what addressing mode the load/store > operations finally use. > > Tier 1 - 3 passed on Macos-aarch64 with or without the vm option > "-XX:+UseZGC" The reason being: this patch cleans up and simplifies memory ops on AArch64. That's nice, but it doesn't fix any bug. If we could do it in a self-contained way, and it's beginning to look that we can't, then that would be fine. But I don't think it's worth the possibility of regressions. I've seen too many "cleanups" that have broken things. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22862#issuecomment-2685126429 From duke at openjdk.org Wed Feb 26 14:18:14 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Wed, 26 Feb 2025 14:18:14 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v7] In-Reply-To: References: Message-ID: > By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with two additional commits since the last revision: - Added more comments, mainly as suggested by Andrew Dinn - Changed aarch64-asmtest.py as suggested by Bhavana-Kilambi ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23300/files - new: https://git.openjdk.org/jdk/pull/23300/files/54373d5a..aa0570db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23300&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23300&range=05-06 Stats: 478 lines in 3 files changed: 40 ins; 6 del; 432 mod Patch: https://git.openjdk.org/jdk/pull/23300.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23300/head:pull/23300 PR: https://git.openjdk.org/jdk/pull/23300 From jsjolen at openjdk.org Wed Feb 26 14:23:15 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 26 Feb 2025 14:23:15 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v31] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Wed, 26 Feb 2025 14:01:58 GMT, Afshin Zafari wrote: >> It checks the `use_tag_inplace` in committing a region, by passing `true` to the last parameter of the `VMATree::commit_mapping`. It is expected that the existing tag from the previous region to be copied to this new region. The mem tag of the `rd2` (`mtNone`) should not be used for the new committed region. > > Also, it is skipped until the corresponding PR get integrated, otherwise it fails. Then move the test to that PR and enable it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1971676943 From haosun at openjdk.org Wed Feb 26 14:41:56 2025 From: haosun at openjdk.org (Hao Sun) Date: Wed, 26 Feb 2025 14:41:56 GMT Subject: RFR: 8345125: Aarch64: Add aarch64 backend for Float16 scalar operations [v2] In-Reply-To: References: <8QDbenZGakijqUrwAcaVogoJBEiNpzYhN3sDrrteSDk=.d8539631-ab03-45ff-a762-0b6e14c63f89@github.com> Message-ID: On Wed, 26 Feb 2025 08:49:58 GMT, Bhavana Kilambi wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2097: >> >>> 2095: >>> 2096: // Half-precision floating-point instructions >>> 2097: INSN(fabdh, 0b011, 0b11, 0b000101, 0b0); >> >> I suppose `fadbh` and `fnmulh` are added to keep aligned with the float and double ones, i.e. `fabd(s|d)` and `fnmul(s|d)`. >> >> >> I noticed that there are matching rules for `fabd(s|d)`, i.e. `absd(F|D)_reg`. I wonder if we need add the corresponding rule for fp16 here? > > Hi @shqking , thanks for your review comments. Yes I added `fabdh` and `fnmulh` to keep aligned with float and double types. > For adding support for FP16 `absd` we need `AbsHF` to be supported (along with SubHF) but `AbsHF` node is not implemented currently. `abs` operation is directly executed from the java code here - https://github.com/openjdk/jdk/blob/037e47112bdf2fa2324f7c58198f6d433f17d9fd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java#L1464 and is not intrinsified or pattern matched like other FP16 operations. Same with `negate` operation for FP16 - https://github.com/openjdk/jdk/blob/037e47112bdf2fa2324f7c58198f6d433f17d9fd/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java#L1449 > On the Valhalla repo, while these operation were being developed, I tried adding support for `AbsHF/NegHF` which emitted `fabs` and `fneg` instructions but the performance with the direct java code(bit manipulation operations) was much faster (sorry don't remember the exact number) so we decided to go with the java implementation instead. > I still added `fabd` here because `op21` is 0 only in `fabd` H variant and felt that it'd be better to handle it here as it belongs to this group of instructions. Please let me know your thoughts. @Bhavana-Kilambi Thanks for your explanation for the missing `AbsHF`. It's okay to me to have `fadbh` and `fnmulh` in this patch. Overall it's good to me except aph's comment above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23748#discussion_r1971712164 From adinn at openjdk.org Wed Feb 26 14:58:07 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 26 Feb 2025 14:58:07 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v7] In-Reply-To: References: Message-ID: <8h5rWJFe3PKLNO6QiDZiAj98ePBoCilk0b9w420hZLE=.a17a4ecd-757b-405c-8f5a-5470bde5bf18@github.com> On Wed, 26 Feb 2025 14:18:14 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with two additional commits since the last revision: > > - Added more comments, mainly as suggested by Andrew Dinn > - Changed aarch64-asmtest.py as suggested by Bhavana-Kilambi Ok, still good ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23300#pullrequestreview-2644812035 From fgao at openjdk.org Wed Feb 26 15:09:04 2025 From: fgao at openjdk.org (Fei Gao) Date: Wed, 26 Feb 2025 15:09:04 GMT Subject: RFR: 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands [v3] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 14:05:06 GMT, Andrew Haley wrote: > The reason being: this patch cleans up and simplifies memory ops on AArch64. That's nice, but it doesn't fix any bug. If we could do it in a self-contained way, and it's beginning to look that we can't, then that would be fine. But I don't think it's worth the possibility of regressions. I've seen too many "cleanups" that have broken things. @theRealAph yes, I agree. I'll stop processing this pull request and close it. Thanks for your quick response. BTW, now for [indOffIN/indOffLN](https://github.com/openjdk/jdk/blob/75420e9314c54adc5b45f9b274a87af54dd6b5a8/src/hotspot/cpu/aarch64/aarch64.ad#L5466), `legitimize_address()` fixes some out-of-range offsets. It looks like there is still possibility of breaking `implicit null check`, isn't there? Do you think if it's worthwhile to move to this original idea: [adding more memory operands for compressed pointers](https://github.com/openjdk/jdk/pull/16991/commits/1895cf3112d341b0ae8beb6dd9d332f6a2c5d5fc)? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22862#issuecomment-2685346903 From stuefe at openjdk.org Wed Feb 26 15:30:57 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 26 Feb 2025 15:30:57 GMT Subject: RFR: 8344009: Improve compiler memory statistics In-Reply-To: References: <0wHGNSlwe7cWb7Plad2n8Swy8rayYTAf5IETuw9zl4U=.a4d6a129-aebc-4639-aaef-92ee6c4552c7@github.com> Message-ID: <_2PWpcAQigB_MY6KkK1avnTz5vdr5qmIICXP5M_jWN0=.48821f49-c7c4-4d45-8758-edc652837119@github.com> On Wed, 26 Feb 2025 13:00:51 GMT, Roberto Casta?eda Lozano wrote: >>> > @robcasloz I identified and hopefully fixed a small issue that hit the "disabled" path. Turns out we allocate arena chunks a lot more frequently than I thought, and the new unconditional call to Thread::current() in there was hurting a bit. I now avoid this unless I know the statistic is enabled. >>> > With this patch, on my machine the difference between unpatched and patched JVM with stats disabled is below one standard deviation for the benchmark in question. >>> >>> Great, thanks! Will re-run benchmarking and report results early next week. >> >> Functional test results (Oracle tier1-5) still look good for the latest commit (dd7a06ad). I can confirm that the C2 speed regression on our linux-x64 machines is almost fully mitigated. The 2-3% regression on our macosx-aarch64 machines does not seem to be addressed by the latest changes though, but as I mentioned before I think it is in the acceptable range (and only affects one benchmark). > >> @robcasloz, @ashu-mehra thanks a lot for your reviews. I incorporated most of them into the PR. > > Thanks, Thomas! I see that the changes suggested in https://github.com/openjdk/jdk/commit/d501bd8a674229904358fb168a9c347004efeea3 are not incorporated, is it because you find them out of the scope of this PR? I would argue that at least tagging `Compile::_Compile_types` with `tag_type` is relevant and in line with the other changes included in this PR, e.g. [this one](https://github.com/openjdk/jdk/pull/23530/files#diff-3559dcf23b719805be5fd06fd5c1851dbd8f53e47afe6d99cba13a3de0ebc6b2R443). > > @robcasloz, @ashu-mehra thanks a lot for your reviews. I incorporated most of them into the PR. > > Thanks, Thomas! I see that the changes suggested in [d501bd8](https://github.com/openjdk/jdk/commit/d501bd8a674229904358fb168a9c347004efeea3) are not incorporated, is it because you find them out of the scope of this PR? I would argue that at least tagging `Compile::_Compile_types` with `tag_type` is relevant and in line with the other changes included in this PR, e.g. [this one](https://github.com/openjdk/jdk/pull/23530/files#diff-3559dcf23b719805be5fd06fd5c1851dbd8f53e47afe6d99cba13a3de0ebc6b2R443). Sorry, this was an oversight. Will do this, no reason not to. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23530#issuecomment-2685412110 From rkennke at openjdk.org Wed Feb 26 15:51:59 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 26 Feb 2025 15:51:59 GMT Subject: RFR: 8350770: [BACKOUT] Protection zone for easier detection of accidental zero-nKlass use In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 12:21:43 GMT, Thomas Stuefe wrote: > Please review this backout of JDK-8330174; it broke several tests at SAP and Oracle. See https://bugs.openjdk.org/browse/JDK-8350768. Looks good! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23799#pullrequestreview-2644997227 From stuefe at openjdk.org Wed Feb 26 16:00:10 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 26 Feb 2025 16:00:10 GMT Subject: RFR: 8350770: [BACKOUT] Protection zone for easier detection of accidental zero-nKlass use In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 13:25:28 GMT, Martin Doerr wrote: >> Please review this backout of JDK-8330174; it broke several tests at SAP and Oracle. See https://bugs.openjdk.org/browse/JDK-8350768. > > The backout is correct. Thanks @TheRealMDoerr and @rkennke ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23799#issuecomment-2685489703 From stuefe at openjdk.org Wed Feb 26 16:00:10 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 26 Feb 2025 16:00:10 GMT Subject: Integrated: 8350770: [BACKOUT] Protection zone for easier detection of accidental zero-nKlass use In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 12:21:43 GMT, Thomas Stuefe wrote: > Please review this backout of JDK-8330174; it broke several tests at SAP and Oracle. See https://bugs.openjdk.org/browse/JDK-8350768. This pull request has now been integrated. Changeset: 3e46480d Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/3e46480dcfabf79b74cc371eaa84dce2e252f3da Stats: 426 lines in 16 files changed: 29 ins; 331 del; 66 mod 8350770: [BACKOUT] Protection zone for easier detection of accidental zero-nKlass use Reviewed-by: mdoerr, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/23799 From aph at openjdk.org Wed Feb 26 16:16:53 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 26 Feb 2025 16:16:53 GMT Subject: RFR: 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands [v3] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 15:06:08 GMT, Fei Gao wrote: > BTW, now for [indOffIN/indOffLN](https://github.com/openjdk/jdk/blob/75420e9314c54adc5b45f9b274a87af54dd6b5a8/src/hotspot/cpu/aarch64/aarch64.ad#L5466), `legitimize_address()` fixes some out-of-range offsets. It looks like there is still possibility of breaking `implicit null check`, isn't there? Do you think if it's worthwhile to move to this original idea: [adding more memory operands for compressed pointers](https://github.com/openjdk/jdk/pull/16991/commits/1895cf3112d341b0ae8beb6dd9d332f6a2c5d5fc)? Thanks. I can't think of a batter alternative. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22862#issuecomment-2685536701 From mdoerr at openjdk.org Wed Feb 26 16:31:09 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Feb 2025 16:31:09 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v27] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Wed, 26 Feb 2025 12:21:39 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: > > - change pattern for Linux, fix for AIX > - change pattern for Linux, fix for AIX This looks much better! Thanks! I'll rerun tests. Maybe you can find names for some of the temp registers? Especially vTmp4 and vTmp11 are expected to contain specific values before entering `computeGCMProduct`. This would improve readability. ------------- PR Review: https://git.openjdk.org/jdk/pull/20235#pullrequestreview-2645116179 From shade at openjdk.org Wed Feb 26 16:44:03 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Feb 2025 16:44:03 GMT Subject: RFR: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 [v5] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 07:35:30 GMT, Aleksey Shipilev wrote: >> See bug for description of the bug. Shenandoah seems to be the only GC that runs into this problem so far. >> >> Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, original reproducer now passes >> - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` >> - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Drop "is" Thanks for reviews! Local testing passes fine, so I am integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23775#issuecomment-2685603297 From shade at openjdk.org Wed Feb 26 16:44:03 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Feb 2025 16:44:03 GMT Subject: Integrated: 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 13:00:52 GMT, Aleksey Shipilev wrote: > See bug for description of the bug. Shenandoah seems to be the only GC that runs into this problem so far. > > Before [JDK-8346567](https://bugs.openjdk.org/browse/JDK-8346567), we pulled class modifiers from the native `Klass*`, and so we bypassed this trouble. But now we take modifiers out of Java mirror, and this happens during unloading, which accesses/resurrects potentially dead mirror. > > Additional testing: > - [x] Linux x86_64 server fastdebug, original reproducer now passes > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` > - [x] Linux x86_64 server fastdebug, `jdk_jfr` with `-XX:+UseShenandoahGC` now passes This pull request has now been integrated. Changeset: ec6624b5 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/ec6624b54eaf5c0f94bd760d2e9fa8b55717c350 Stats: 7 lines in 2 files changed: 5 ins; 0 del; 2 mod 8350649: Class unloading accesses/resurrects dead Java mirror after JDK-8346567 Reviewed-by: coleenp, egahlin ------------- PR: https://git.openjdk.org/jdk/pull/23775 From fgao at openjdk.org Wed Feb 26 16:50:06 2025 From: fgao at openjdk.org (Fei Gao) Date: Wed, 26 Feb 2025 16:50:06 GMT Subject: RFR: 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands [v3] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 16:14:10 GMT, Andrew Haley wrote: >>> The reason being: this patch cleans up and simplifies memory ops on AArch64. That's nice, but it doesn't fix any bug. If we could do it in a self-contained way, and it's beginning to look that we can't, then that would be fine. But I don't think it's worth the possibility of regressions. I've seen too many "cleanups" that have broken things. >> >> @theRealAph yes, I agree. I'll stop processing this pull request and close it. Thanks for your quick response. >> >> BTW, now for [indOffIN/indOffLN](https://github.com/openjdk/jdk/blob/75420e9314c54adc5b45f9b274a87af54dd6b5a8/src/hotspot/cpu/aarch64/aarch64.ad#L5466), `legitimize_address()` fixes some out-of-range offsets. It looks like there is still possibility of breaking `implicit null check`, isn't there? Do you think if it's worthwhile to move to this original idea: [adding more memory operands for compressed pointers](https://github.com/openjdk/jdk/pull/16991/commits/1895cf3112d341b0ae8beb6dd9d332f6a2c5d5fc)? Thanks. > >> BTW, now for [indOffIN/indOffLN](https://github.com/openjdk/jdk/blob/75420e9314c54adc5b45f9b274a87af54dd6b5a8/src/hotspot/cpu/aarch64/aarch64.ad#L5466), `legitimize_address()` fixes some out-of-range offsets. It looks like there is still possibility of breaking `implicit null check`, isn't there? Do you think if it's worthwhile to move to this original idea: [adding more memory operands for compressed pointers](https://github.com/openjdk/jdk/pull/16991/commits/1895cf3112d341b0ae8beb6dd9d332f6a2c5d5fc)? Thanks. > > I can't think of a better alternative. Thanks @theRealAph . Work will be continued in a new pull request. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22862#issuecomment-2685620128 From fgao at openjdk.org Wed Feb 26 16:50:06 2025 From: fgao at openjdk.org (Fei Gao) Date: Wed, 26 Feb 2025 16:50:06 GMT Subject: Withdrawn: 8341611: [REDO] AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands In-Reply-To: References: Message-ID: On Mon, 23 Dec 2024 10:45:00 GMT, Fei Gao wrote: > `IndOffXX` types don't do us any good. It would be simpler and faster to match a general-purpose `IndOff` type then let `legitimize_address()` fix any out-of-range operands. That'd reduce the size of the match rules and the time to run them. > > This patch simplifies the definitions of `immXOffset` with an estimated range. Whether an immediate can be encoded in a `LDR`/`STR` instructions as an offset will be determined in the phase of code-emitting. Meanwhile, we add necessary `legitimize_address()` in the phase of matcher for all `LDR`/`STR` instructions using the new `IndOff` memory operands (fix [JDK-8341437](https://bugs.openjdk.org/browse/JDK-8341437)). > > After this clean-up, memory operands matched with `IndOff` may require extra code emission (effectively a `lea`) before the address can be used. So we also modify the code about looking up precise offset of load/store instruction for implicit null check (fix [JDK-8340646](https://bugs.openjdk.org/browse/JDK-8340646)). On `aarch64` platform, we will use the beginning offset of the last instruction in the instruction clause emitted for a load/store machine node. Because `LDR`/`STR` is always the last one emitted, no matter what addressing mode the load/store operations finally use. > > Tier 1 - 3 passed on `Macos-aarch64` with or without the vm option `-XX:+UseZGC`. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/22862 From mdoerr at openjdk.org Wed Feb 26 17:02:00 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Feb 2025 17:02:00 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v27] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Wed, 26 Feb 2025 12:21:39 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: > > - change pattern for Linux, fix for AIX > - change pattern for Linux, fix for AIX src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 702: > 700: __ lvx(vHigh, temp1, data); > 701: #ifdef VM_LITTLE_ENDIAN > 702: __ xxspltib(vTmp12->to_vsr(), 31); Is this instruction available on Power8? Shouldn't we use `vspltisb`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1971985131 From sroy at openjdk.org Wed Feb 26 17:15:08 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Wed, 26 Feb 2025 17:15:08 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v27] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Wed, 26 Feb 2025 16:59:03 GMT, Martin Doerr wrote: >> Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: >> >> - change pattern for Linux, fix for AIX >> - change pattern for Linux, fix for AIX > > src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 702: > >> 700: __ lvx(vHigh, temp1, data); >> 701: #ifdef VM_LITTLE_ENDIAN >> 702: __ xxspltib(vTmp12->to_vsr(), 31); > > Is this instruction available on Power8? Shouldn't we use `vspltisb`? Need to check that. with vspltisb, we cannot broadcast more than value of 15. Hence I used this instruction instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1972007115 From mdoerr at openjdk.org Wed Feb 26 17:52:08 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Feb 2025 17:52:08 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v27] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Wed, 26 Feb 2025 17:12:38 GMT, Suchismith Roy wrote: >> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 702: >> >>> 700: __ lvx(vHigh, temp1, data); >>> 701: #ifdef VM_LITTLE_ENDIAN >>> 702: __ xxspltib(vTmp12->to_vsr(), 31); >> >> Is this instruction available on Power8? Shouldn't we use `vspltisb`? > > Need to check that. with vspltisb, we cannot broadcast more than value of 15. Hence I used this instruction instead. I think we could use `vspltisb(vTmp12, -1)`. This would flip all bits which is also fine. `vec_perm` only looks at 5 bits per Byte. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1972073234 From galder at openjdk.org Wed Feb 26 18:33:03 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 26 Feb 2025 18:33:03 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Wed, 26 Feb 2025 11:32:57 GMT, Galder Zamarre?o wrote: > > That said: if we know that it is only in the high-probability cases, then we can address those separately. I would not consider it a blocking issue, as long as we file the follow-up RFE for int/max scalar case with high branch probability. > > What would be really helpful: a list of all regressions / issues, and how we intend to deal with them. If we later find a regression that someone cares about, then we can come back to that list, and justify the decision we made here. > > I'll make up a list of regressions and post it here. I won't create RFEs for now. I'd rather wait until we have the list in front of us and we can decide which RFEs to create. Before noting the regressions, it's worth noting that PR also improves performance certain scenarios. I will summarise those tomorrow. Here's a summary of the regressions ### Regression 1 Given a loop with a long min/max reduction pattern with one side of branch taken near 100% of time, when Supeword finds the pattern not profitable, then HotSpot will use scalar instructions (cmov) and performance will regress. Possible solutions: a) make Superword recognise these scenarios as profitable. ### Regression 2 Given a loop with a long min/max reduction pattern with one side of branch near 100% of time, when the platform does not support vector instructions to achieve this (e.g. AVX-512 quad word vpmax/vpmin), then HotSpot will use scalar instructions (cmov) and performance will regress. Possible solutions a) find a way to use other vector instructions (vpcmp+vpblend+vmov?) b) fallback on more suitable scalar instructions, e.g. cmp+mov, when the branch is very one-sided ### Regression 3 Given a loop with a long min/max non-reduction pattern (e.g. `longLoopMax`) with one side of branch taken near 100% of time, when the platform does not vectorize it (either lack of CPU instruction support, or Superword finding not profitable), then HotSpot will use scalar instructions (cmov) and performance will regress. Possible solutions: a) find a way to use other vector instructions (e.g. `longLoopMax` vectorizes with AVX2 and might also do with earlier instruction sets) b) fallback on more suitable scalar instructions, e.g. cmp+mov, when the branch is very one-sided, ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2685865807 From rkennke at amazon.de Wed Feb 26 18:52:33 2025 From: rkennke at amazon.de (Kennke, Roman) Date: Wed, 26 Feb 2025 18:52:33 +0000 Subject: Discussion: How to get to single object header layout Message-ID: <881959A3-6D5D-4FE6-A029-B6F1F2BB1BDD@amazon.de> (2nd attempt at sending this. If you receive the other attempt, please ignore it.) This is a follow-up to discussions that we had at the OpenJDK Committers Workshop earlier this month. I have been asked to come up with a detailed schedule of how I envision to make compact object headers aka Project Lilliput the default and one and only header layout in HotSpot, and post it here for wider discussion. The goal is to get to a consensus and prepare the various HotSpot contributors for what may come and when. In particular, I am aware that other, potentially conflicting changes are in the pipeline too (looking at Valhalla, possibly other projects, too?), which we should coordinate, not only in terms of code, but also in terms of reviewer/testing resources. We agreed at the OCW that we should make the current 8-byte-headers the default object header layout first, and only then build 4-byte-headers on top of that. We also agreed that we want as few flags permutations as possible (if it were me, I would vote for having no new flags at all, and have new implementations replace old implementations, but I can see the operational usefulness of having a fallback available if something unexpected goes wrong.) Many of the proposed changes are ?only? various progressions of flags moving from new/experimental to non-experimental to default to deprecated and obsoleted. The bulk of code changes would be in JDK 27, when Lilliput 2 hits the repos (which isn't as scary as Lilliput 1, which replaced the whole locking subsystem), and the current 12-byte headers are removed. Please let me know what you think, how we can make this work, and especially whatever concerns you may have. JDK 25: - 8350272: Deprecate UseCompressedClassPointers for removal - 8350457: Support Compact Object Headers as product option JDK 26: - 8346011: [Lilliput] Compact Full-GC Forwarding (required for UCCP removal, pre-req for L2) - Obsolete +/-UseCompressedClassPointers - Make +UseCompactObjectHeaders on-by-default - Deprecate -UseCompactObjectHeaders JDK 27: - Obsolete -UseCompactObjectHeaders - 8320761: [Lilliput] Implement compact identity hashcode (alternative code path to L1, under new experimental flag, e.g. +/-UseTinyObjectHeaders) - 8347710: [Lilliput] Implement 4 byte headers (alternative code path to L1, under new experimental flag, e.g. +/-UseTinyObjectHeaders) JDK 28: - Make +/-UseTinyObjectHeaders non-experimental JDK 29: - Make +UseTinyObjectHeaders on-by-default - Deprecate -UseTinyObjectHeaders JDK 30: - Obsolete -UseTinyObjectHeaders From roland at openjdk.org Wed Feb 26 19:34:04 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 26 Feb 2025 19:34:04 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v4] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 09:27:13 GMT, Emanuel Peter wrote: >> Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below. >> >> **Background** >> >> With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer. >> >> **Problem** >> >> So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code. >> >> >> MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1); >> MemorySegment nativeUnaligned = nativeAligned.asSlice(1); >> test3(nativeUnaligned); >> >> >> When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not! >> >> static void test3(MemorySegment ms) { >> for (int i = 0; i < RANGE; i++) { >> long adr = i * 4L; >> int v = ms.get(ELEMENT_LAYOUT, adr); >> ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1)); >> } >> } >> >> >> **Solution: Runtime Checks - Predicate and Multiversioning** >> >> Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check. >> >> I came up with 2 options where to place the runtime checks: >> - A new "auto vectorization" Parse Predicate: >> - This only works when predicates are available. >> - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop. >> - Multiversion the loop: >> - Create 2 copies of the loop (fast and slow loops). >> - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take >> - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even ... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 66 commits: > > - Merge branch 'master' into JDK-8323582-SW-native-alignment > - stall -> delay, plus some more comments > - adjust selector if probability > - Merge branch 'master' into JDK-8323582-SW-native-alignment > - remove multiversion mark if we break the structure > - register opaque with igvn > - copyright and rm CFG check > - IR rules for all cases > - 3 test versions > - test changed to unaligned ints > - ... and 56 more: https://git.openjdk.org/jdk/compare/d551daca...8eb52292 Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22016#pullrequestreview-2645658428 From sviswanathan at openjdk.org Wed Feb 26 19:57:03 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 26 Feb 2025 19:57:03 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 21:49:42 GMT, Volodymyr Paprotski wrote: > Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) > > Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. > > Before (no AVX512) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s > > After (with AVX2) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s > > > Before (with AVX512): > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 102... src/java.base/share/classes/sun/security/util/math/intpoly/MontgomeryIntegerPolynomialP256.java line 423: > 421: r[2] = ((c7 & mask) | (c2 & ~mask)); > 422: r[3] = ((c8 & mask) | (c3 & ~mask)); > 423: r[4] = ((c9 & mask) | (c4 & ~mask)); It would be good to add a comment here indicating that if the result (c9 - c5) had overflown by one modulus, result - modulus (c4-c0) would be positive else it would be negative. i.e. Upper bits of c4 would be all zeroes on overflow otherwise upper bits of c4 would be all ones. Thus on overflow, return value "r" should be set to result - modulus (c4 - c0) else it should be set to result (c9-c5). test/jdk/com/sun/security/util/math/intpoly/MontgomeryPolynomialFuzzTest.java line 2: > 1: /* > 2: * Copyright (c) 2025, Intel Corporation. All rights reserved. This should be Copyright (c) 2024, 2025, Intel Corporation. All rights reserved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1972301843 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1972267785 From martin.doerr at sap.com Wed Feb 26 20:37:02 2025 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 26 Feb 2025 20:37:02 +0000 Subject: Discussion: How to get to single object header layout In-Reply-To: <881959A3-6D5D-4FE6-A029-B6F1F2BB1BDD@amazon.de> References: <881959A3-6D5D-4FE6-A029-B6F1F2BB1BDD@amazon.de> Message-ID: The plan sounds great to me. I especially like the plan for JDK 25. We will see if we need adjustments in the later releases. I appreciate being able to use Compact Object Headers in production with JDK 25 LTS. We can backport fixes to make it stable and reliable. The more people use it the better. In terms of testing and footprint savings. Best regards, Martin Von: hotspot-dev im Auftrag von Kennke, Roman Datum: Mittwoch, 26. Februar 2025 um 19:53 An: hotspot-dev at openjdk.org Betreff: Discussion: How to get to single object header layout (2nd attempt at sending this. If you receive the other attempt, please ignore it.) This is a follow-up to discussions that we had at the OpenJDK Committers Workshop earlier this month. I have been asked to come up with a detailed schedule of how I envision to make compact object headers aka Project Lilliput the default and one and only header layout in HotSpot, and post it here for wider discussion. The goal is to get to a consensus and prepare the various HotSpot contributors for what may come and when. In particular, I am aware that other, potentially conflicting changes are in the pipeline too (looking at Valhalla, possibly other projects, too?), which we should coordinate, not only in terms of code, but also in terms of reviewer/testing resources. We agreed at the OCW that we should make the current 8-byte-headers the default object header layout first, and only then build 4-byte-headers on top of that. We also agreed that we want as few flags permutations as possible (if it were me, I would vote for having no new flags at all, and have new implementations replace old implementations, but I can see the operational usefulness of having a fallback available if something unexpected goes wrong.) Many of the proposed changes are ?only? various progressions of flags moving from new/experimental to non-experimental to default to deprecated and obsoleted. The bulk of code changes would be in JDK 27, when Lilliput 2 hits the repos (which isn't as scary as Lilliput 1, which replaced the whole locking subsystem), and the current 12-byte headers are removed. Please let me know what you think, how we can make this work, and especially whatever concerns you may have. JDK 25: - 8350272: Deprecate UseCompressedClassPointers for removal - 8350457: Support Compact Object Headers as product option JDK 26: - 8346011: [Lilliput] Compact Full-GC Forwarding (required for UCCP removal, pre-req for L2) - Obsolete +/-UseCompressedClassPointers - Make +UseCompactObjectHeaders on-by-default - Deprecate -UseCompactObjectHeaders JDK 27: - Obsolete -UseCompactObjectHeaders - 8320761: [Lilliput] Implement compact identity hashcode (alternative code path to L1, under new experimental flag, e.g. +/-UseTinyObjectHeaders) - 8347710: [Lilliput] Implement 4 byte headers (alternative code path to L1, under new experimental flag, e.g. +/-UseTinyObjectHeaders) JDK 28: - Make +/-UseTinyObjectHeaders non-experimental JDK 29: - Make +UseTinyObjectHeaders on-by-default - Deprecate -UseTinyObjectHeaders JDK 30: - Obsolete -UseTinyObjectHeaders -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.reinhold at oracle.com Wed Feb 26 22:56:55 2025 From: mark.reinhold at oracle.com (Mark Reinhold) Date: Wed, 26 Feb 2025 22:56:55 +0000 Subject: New candidate JEP: 503: Remove the 32-bit x86 Port Message-ID: <20250226225654.9C72778B09F@eggemoggin.niobe.net> https://openjdk.org/jeps/503 Summary: Remove the source code and build support for the 32-bit x86 port. This port was deprecated for removal in JDK 24 (JEP 501) with the express intent to remove it in a future release. - Mark From syan at openjdk.org Thu Feb 27 06:20:01 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 27 Feb 2025 06:20:01 GMT Subject: RFR: 8350723: RISC-V: debug.cpp help() is missing riscv line for pns In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 07:49:38 GMT, Fei Yang wrote: >> Hi all, >> >> This PR add RISC-V entry line for pns() call in src/hotspot/share/utilities/debug.cpp file. This will be useful when call help() function in gdb. No risk. > > Looks good and trivial. Thanks @RealFYang ------------- PR Comment: https://git.openjdk.org/jdk/pull/23793#issuecomment-2687013737 From syan at openjdk.org Thu Feb 27 06:20:02 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 27 Feb 2025 06:20:02 GMT Subject: Integrated: 8350723: RISC-V: debug.cpp help() is missing riscv line for pns In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 07:24:48 GMT, SendaoYan wrote: > Hi all, > > This PR add RISC-V entry line for pns() call in src/hotspot/share/utilities/debug.cpp file. This will be useful when call help() function in gdb. No risk. This pull request has now been integrated. Changeset: bb48b731 Author: SendaoYan URL: https://git.openjdk.org/jdk/commit/bb48b7319c020f9bb135c0bdf3e8809d0314c837 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8350723: RISC-V: debug.cpp help() is missing riscv line for pns Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/23793 From epeter at openjdk.org Thu Feb 27 06:57:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Feb 2025 06:57:04 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: <63F-0aHgMthexL0b2DFmkW8_QrJeo8OOlCaIyZApfpY=.4744070d-9d56-4031-8684-be14cf66d1e5@github.com> On Wed, 26 Feb 2025 18:29:58 GMT, Galder Zamarre?o wrote: >>> > Re: [#20098 (comment)](https://github.com/openjdk/jdk/pull/20098#issuecomment-2671144644) - I was trying to think what could be causing this. >>> >>> Maybe it is an issue with probabilities? Do you know at what point (if at all) the `MinI` node appears/disappears in that example? >> >> The probabilities are fine. >> >> I think the issue with `Math.min(II)` seems to be specific to when its compilation happens, and the combined fact that the intrinsic has been disabled and vectorization does not kick in (explicitly disabled). Note that other parts of the JDK invoke `Math.min(II)`. >> >> In the slow cases it appears the compilation happens before the benchmark kicks in, and so it takes the profiling data before the benchmark to decide how to compile this in. >> >> In the slow versions you see this `PrintMethodData`: >> >> static java.lang.Math::min(II)I >> interpreter_invocation_count: 18171 >> invocation_counter: 18171 >> backedge_counter: 0 >> decompile_count: 0 >> mdo size: 328 bytes >> >> 0 iload_0 >> 1 iload_1 >> 2 if_icmpgt 9 >> 0 bci: 2 BranchData taken(7732) displacement(56) >> not taken(10180) >> 5 iload_0 >> 6 goto 10 >> 32 bci: 6 JumpData taken(10180) displacement(24) >> 9 iload_1 >> 10 ireturn >> >> org.openjdk.bench.java.lang.MinMaxVector::intReductionSimpleMin(Lorg/openjdk/bench/java/lang/MinMaxVector$LoopState;)I >> interpreter_invocation_count: 189 >> invocation_counter: 189 >> backedge_counter: 313344 >> decompile_count: 0 >> mdo size: 384 bytes >> >> 0 iconst_0 >> 1 istore_2 >> 2 iconst_0 >> 3 istore_3 >> 4 iload_3 >> 5 aload_1 >> 6 fast_igetfield 35 >> 9 if_icmpge 33 >> 0 bci: 9 BranchData taken(58) displacement(72) >> not taken(192512) >> 12 aload_1 >> 13 fast_agetfield 41 >> 16 iload_3 >> 17 iaload >> 18 istore #4 >> 20 iload_2 >> 21 fast_iload #4 >> 23 invokestatic 32 >> 32 bci: 23 CounterData count(192512) >> 26 istore_2 >> 27 iinc #3 1 >> 30 goto 4 >> 48 bci: 30 JumpData taken(192512) displacement(-48) >> 33 iload_2 >> 34 ireturn >> >> >> The benchmark method calls Math... > >> > That said: if we know that it is only in the high-probability cases, then we can address those separately. I would not consider it a blocking issue, as long as we file the follow-up RFE for int/max scalar case with high branch probability. >> > What would be really helpful: a list of all regressions / issues, and how we intend to deal with them. If we later find a regression that someone cares about, then we can come back to that list, and justify the decision we made here. >> >> I'll make up a list of regressions and post it here. I won't create RFEs for now. I'd rather wait until we have the list in front of us and we can decide which RFEs to create. > > Before noting the regressions, it's worth noting that PR also improves performance certain scenarios. I will summarise those tomorrow. > > Here's a summary of the regressions > > ### Regression 1 > Given a loop with a long min/max reduction pattern with one side of branch taken near 100% of time, when Supeword finds the pattern not profitable, then HotSpot will use scalar instructions (cmov) and performance will regress. > > Possible solutions: > a) make Superword recognise these scenarios as profitable. > > ### Regression 2 > Given a loop with a long min/max reduction pattern with one side of branch near 100% of time, when the platform does not support vector instructions to achieve this (e.g. AVX-512 quad word vpmax/vpmin), then HotSpot will use scalar instructions (cmov) and performance will regress. > > Possible solutions > a) find a way to use other vector instructions (vpcmp+vpblend+vmov?) > b) fallback on more suitable scalar instructions, e.g. cmp+mov, when the branch is very one-sided > > ### Regression 3 > Given a loop with a long min/max non-reduction pattern (e.g. `longLoopMax`) with one side of branch taken near 100% of time, when the platform does not vectorize it (either lack of CPU instruction support, or Superword finding not profitable), then HotSpot will use scalar instructions (cmov) and performance will regress. > > Possible solutions: > a) find a way to use other vector instructions (e.g. `longLoopMax` vectorizes with AVX2 and might also do with earlier instruction sets) > b) fallback on more suitable scalar instructions, e.g. cmp+mov, when the branch is very one-sided, @galderz Thanks for the summary of regressions! Yes, there are plenty of speedups, I assume primarily because of `Long.min/max` vectorization, but possibly also because the operation can now "float" out of a loop for example. All your Regressions 1-3 are cases with "extreme" probabilitiy (close to 100% / 0%), you listed none else. That matches with my intuition, that branching code is usually better than cmove in extreme probability cases. As for possible solutions. In all Regression 1-3 cases, it seems the issue is scalar cmove. So actually in all cases a possible solution is using branching code (i.e. `cmp+mov`). So to me, these are the follow-up RFE's: - Detect "extreme" probability scalar cmove, and replace them with branching code. This should take care of all regressions here. This one has high priority, as it fixes the regression caused by this patch here. But it would also help to improve performance for the `Integer.min/max` cases, which have the same issue. - Additional performance improvement: make SuperWord recognize more cases as profitble (see Regression 1). Optional. - Additional performance improvement: extend backend capabilities for vectorization (see Regression 2 + 3). Optional. Does that make sense, or am I missing something? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2687067125 From epeter at openjdk.org Thu Feb 27 07:02:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Feb 2025 07:02:10 GMT Subject: RFR: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory [v4] In-Reply-To: References: <6R7kv7XGOWIBrjPQCemB6u2vd_tFl_xMQGQaVWoxkK0=.d26f6780-82f8-4ab9-a4bc-ff7831ed9a1a@github.com> Message-ID: On Wed, 26 Feb 2025 10:15:48 GMT, Roland Westrelin wrote: >>> Would it be possible and make sense to remove useless slow path loops the way it's done for predicates or zero trip guards? In `PhaseIdealLoop::build_loop_late_post_work()`, collect all `OpaqueMultiversioningNode` in a list. Then iterate over all loops the way it's done in `PhaseIdealLoop::eliminate_useless_zero_trip_guard()`, find loops marked as multi version, check we can get from the loop to the `OpaqueMultiversioningNode` and mark that one as useful. Eliminate all `OpaqueMultiversioningNode` not marked as useful. That way if some transformation such as peeling makes the loop non multi version or if the expected shape breaks for some reason, the slow loop is eliminated on next loop opts pass. >> >> I suppose we could try that. Is it ok to do that in a separate RFE, so we are keeping this here to a more manageable size? >> >> I don't see it as super critical personally, as the slow_path is `delayed`, so no loop-opts are performed on it. The overhead is minimal if we keep it until after loop-opts, I think. But I'm not against trying. It would take a bit of effort to construct test cases where we have the loop fold away after multiversion_if is added, but that is probably possible. >> >> And would we not have similar issues with traversing from the loops to their `OpaqueMultiversioningNode`? What if some are not reachable in the meantime? Then we would just lose the `multiversion_if` early, and could not use it any more. So maybe we'd have to do that after the verification: >> [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637): C2: verify that main_loop finds pre_loop and that multiversion loops find the multiversion_if >> >> I wonder if we do not have similar issues with `PhaseIdealLoop::eliminate_useless_zero_trip_guard()` currently. Maybe it's rare enough we don't notice. >> >> @rwestrel What do you think? > >> I suppose we could try that. Is it ok to do that in a separate RFE, so we are keeping this here to a more manageable size? > > Ok > >> And would we not have similar issues with traversing from the loops to their `OpaqueMultiversioningNode`? What if some are not reachable in the meantime? Then we would just lose the `multiversion_if` early, and could not use it any more. So maybe we'd have to do that after the verification: [JDK-8350637](https://bugs.openjdk.org/browse/JDK-8350637): C2: verify that main_loop finds pre_loop and that multiversion loops find the multiversion_if >> >> I wonder if we do not have similar issues with `PhaseIdealLoop::eliminate_useless_zero_trip_guard()` currently. Maybe it's rare enough we don't notice. > > I don't think that's a problem. When that code runs the graph is in a stable shape. There's no dead condition that needs to go through igvn to be cleaned up. We've just run igvn and haven't made any change to the graph yet. @rwestrel @vnkozlov Thank you for the reviews, and all the good questions, and ideas for follow-up RFE's ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22016#issuecomment-2687071561 From epeter at openjdk.org Thu Feb 27 07:02:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 27 Feb 2025 07:02:11 GMT Subject: Integrated: 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 14:40:09 GMT, Emanuel Peter wrote: > Note: the approach with Predicates and Multiversioning prepares us well for Runtime Checks for Aliasing Analysis, see more below. > > **Background** > > With `-XX:+AlignVector`, all vector loads/stores must be aligned. We try to statically determine if we can always align the vectors. One condition is that the address `base` is already aligned. For arrays, we know that this always holds, because they are `ObjectAlignmentInBytes` aligned. But with native memory, the `base` is just some arbitrarily aligned pointer. > > **Problem** > > So far, we have just naively assumed that the `base` is always `ObjectAlignmentInBytes` aligned. But that does not hold for `native` memory segments: the `base` can also be unaligned. I had constructed such an example, and with `-XX:+AlignVector -XX:+VerifyAlignVector` this example hits the verification code. > > > MemorySegment nativeAligned = Arena.ofAuto().allocate(RANGE * 4 + 1); > MemorySegment nativeUnaligned = nativeAligned.asSlice(1); > test3(nativeUnaligned); > > > When compiling the test method, we assume that the `nativeUnaligned.address()` is aligned - but it is not! > > static void test3(MemorySegment ms) { > for (int i = 0; i < RANGE; i++) { > long adr = i * 4L; > int v = ms.get(ELEMENT_LAYOUT, adr); > ms.set(ELEMENT_LAYOUT, adr, (int)(v + 1)); > } > } > > > **Solution: Runtime Checks - Predicate and Multiversioning** > > Of course we could just forbid cases where we have a `native` base from vectorizing. But that would lead to regressions currently - in most cases we do get aligned `base`s, and we currently vectorize those. We cannot statically determine if the `base` is aligned, we need a runtime check. > > I came up with 2 options where to place the runtime checks: > - A new "auto vectorization" Parse Predicate: > - This only works when predicates are available. > - If we fail the predicate, then we recompile without the predicate. That means we cannot add a check to the predicate any more, and we would have to do multiversioning at that point if we still want to have a vectorized loop. > - Multiversion the loop: > - Create 2 copies of the loop (fast and slow loops). > - The `fast_loop` can make speculative alignment assumptions, and add the corresponding check to the `multiversion_if` which decides which loop we take > - In the `slow_loop`, we make no assumption which means we can not vectorize, but we still compile - so even unaligned `base`s would end up with reasonably fast code. > - We "stall" the `... This pull request has now been integrated. Changeset: 885338b5 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/885338b5f38ed05d8b91efc0178b371f2f89310e Stats: 1089 lines in 27 files changed: 966 ins; 28 del; 95 mod 8323582: C2 SuperWord AlignVector: misaligned vector memory access with unaligned native memory Reviewed-by: roland, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22016 From dfenacci at openjdk.org Thu Feb 27 08:34:45 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 27 Feb 2025 08:34:45 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v2] In-Reply-To: References: Message-ID: > # Issue > The test `src/hotspot/share/opto/c2compiler.cpp` fails intermittently due to a crash that happens when trying to allocate code cache space for C1 and C2 in `RuntimeStub::new_runtime_stub` and `SingletonBlob::operator new`. > > # Causes > There are a few call paths during the initialization of C1 and C2 that can lead to the code cache allocations in `RuntimeStub::new_runtime_stub` (through `RuntimeStub::operator new`) and `SingletonBlob::operator new` triggering a fatal error if there is no more space. The paths in question are: > 1. `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_uncommon_trap_blob` -> `UncommonTrapBlob::create` -> `new UncommonTrapBlob` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_exception_blob` -> `ExceptionBlob::create` -> `new ExceptionBlob` > > # Solution > Instead of fatally crashing the we can use the `alloc_fail_is_fatal` flag of `RuntimeStub::new_runtime_stub` to avoid crashing in cases 1 and 2 and add a similar flag to `SingletonBlob::operator new` for cases 3 and 4. In the latter case we need to adjust all calls accordingly. > > Note: In [JDK-8326615](https://bugs.openjdk.org/browse/JDK-8326615) it was argued that increasing the minimum code cache size would solve the issue but that wasn't entirely accurate: doing so possibly decreases the chances of a failed allocation in these 4 places but doesn't totally avoid it. > > # Testing > The original failing regression test in `test/hotspot/jtreg/compiler/startup/StartupOutput.java` has been modified to run multiple times with randomized values (within the original failing range) to increase the chances of hitting the fatal assertion. > > Tests: Tier 1-4 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode) Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/c1/c1_Runtime1.cpp Co-authored-by: Dean Long <17332032+dean-long at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23630/files - new: https://git.openjdk.org/jdk/pull/23630/files/e930df47..85eb1022 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23630&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23630&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23630/head:pull/23630 PR: https://git.openjdk.org/jdk/pull/23630 From dfenacci at openjdk.org Thu Feb 27 08:34:45 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 27 Feb 2025 08:34:45 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 20:53:30 GMT, Dean Long wrote: > I don't understand why JDK-8326615 didn't work. If the minimum codecache size was too small, can't we just increase it? It seems that the small minimum codecache size wasn?t the core of the problem. In JDK-8326615 I?ve actually tried increasing the minimum code cache size (even calculating the minimum C1 and C2 sizes needed) but I kept hitting the same problem seemingly because what triggers the crash is the code cache exceeding its limits **exactly** during one of the allocations in the 4 call paths listed in the description (any allocation at another point simply turns off C1 or C2 (or both)). Increasing the minimum was possibly making it a bit less probable (i.e. when running something small that would use very little code cache, e.g. `java -version`) but as soon as running something a bit bigger (requiring more code cache), we could still end up calling the problematic allocations right when the code cache was full. BTW even if the crash was very dependent on C1/C2 thread scheduling etc., I was able to reproduce it with different minimum code cache sizes relatively frequently. Even so, it might be a good idea to additionally increase the minimum code cache anyway. @dean-long do you think it would make sense to file an RFE for that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23630#issuecomment-2687239402 From azafari at openjdk.org Thu Feb 27 09:25:13 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 27 Feb 2025 09:25:13 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v31] In-Reply-To: <1BsBLyqT4rLpUQTF_ganIYfb8ZyfT5DAf0eJcx8XJes=.e700c4ed-9a34-462f-a143-b352056781d3@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <1BsBLyqT4rLpUQTF_ganIYfb8ZyfT5DAf0eJcx8XJes=.e700c4ed-9a34-462f-a143-b352056781d3@github.com> Message-ID: <0WOvPFaqBDE8Yr8Nvh4IQ3RbRMcPJKk3-LCNyovFybs=.f8dd913d-b99b-40f6-9c9a-f85acfa7f29a@github.com> On Wed, 26 Feb 2025 10:51:11 GMT, Johan Sj?len wrote: > How would I go about verifying the performance gain? You mentioned previously that you wrote a microbenchmark for testing this? There is no performance check for this PR. This PR just use the new VMATree instead of SortedLinkedList for managing the regions. Performance checks and comparisons are left for other future PRs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20425#issuecomment-2687364436 From adinn at openjdk.org Thu Feb 27 09:50:59 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 27 Feb 2025 09:50:59 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: Message-ID: On Fri, 21 Feb 2025 10:23:37 GMT, Ferenc Rakoczi wrote: >> Hi. Here is the test result of our CI. >> >> ### copyright year >> >> the following files should update the copyright year to 2025. >> >> >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp >> src/hotspot/cpu/aarch64/stubRoutines_aarch64.hpp >> src/hotspot/share/runtime/globals.hpp >> src/java.base/share/classes/sun/security/provider/ML_DSA.java >> src/java.base/share/classes/sun/security/provider/SHA3Parallel.java >> test/micro/org/openjdk/bench/java/security/MLDSA.java >> >> >> ### cross-build failure >> >> Cross build for riscv64/s390/ppc64 failed. >> >> Here shows the error msg for ppc64 >> >> >> === Output from failing command(s) repeated here === >> * For target support_interim-jmods_support__create_java.base.jmod_exec: >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/tmp/jdk-src/src/hotspot/share/asm/codeBuffer.hpp:200), pid=72752, tid=72769 >> # assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000e85cb03dc620 <= 0x0000e85cb03e8ab4 <= 0x0000e85cb03e8ab0 >> # >> # JRE version: OpenJDK Runtime Environment (25.0) (fastdebug build 25-internal-git-1e01c6deec3) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 25-internal-git-1e01c6deec3, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) >> # Problematic frame: >> # V [libjvm.so+0x3b391c] Instruction_aarch64::~Instruction_aarch64()+0xbc >> # >> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /tmp/ci-scripts/jdk-src/make/ >> # >> # An error report file with more information is saved as: >> # /tmp/jdk-src/make/hs_err_pid72752.log >> ... (rest of output omitted) >> >> * All command lines available in /sysroot/ppc64el/tmp/build-ppc64el/make-support/failure-logs. >> === End of repeated output === >> >> >> I suppose we should make the similar update at file `src/hotspot/cpu/aarch64/stubDeclarations_aarch64.hpp` to other platforms > > @shqking, I changed the copyright years, but I don't really understand how the aarch64-specific code can overflow buffers on other architectures. As far as I understand, Instruction_aarch64 should not have been there in a ppc build. > Was this a build attempted on an aarch64 for the other architectures? @ferakocz Apologies for raising yet another resolve conflict. You will need to make a further adjustment to the compiler blob declaration to accommodate a fix I just pushed to resolve a problem with cross-compilation. Your patch should now specify do_arch_blob(compiler, 50000 ZGC_ONLY(+10000)) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2687427983 From adinn at openjdk.org Thu Feb 27 09:56:02 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 27 Feb 2025 09:56:02 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v7] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 14:18:14 GMT, Ferenc Rakoczi wrote: >> By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with two additional commits since the last revision: > > - Added more comments, mainly as suggested by Andrew Dinn > - Changed aarch64-asmtest.py as suggested by Bhavana-Kilambi Oops. sorry - cut and paste error -- the new setting should be do_arch_blob(compiler, 55000 ZGC_ONLY(+5000)) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23300#issuecomment-2687440017 From rcastanedalo at openjdk.org Thu Feb 27 10:15:07 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 27 Feb 2025 10:15:07 GMT Subject: RFR: 8344009: Improve compiler memory statistics [v6] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 16:43:21 GMT, Thomas Stuefe wrote: >> Greetings, >> >> This is a rewrite of the Compiler Memory Statistic. The primary new feature is the capability to track allocations by C2 phases. This will allow for a much faster, more thorough analysis of footprint issues. >> >> Tracking Arena memory movement is not trivial since one needs to follow the ebb and flow of allocations over nested C2 phases. A phase typically allocates more than it releases, accruing new nodes and resource area. A phase can also release more than allocated when Arenas carried over from other phases go out of scope in this phase. Finally, it can have high temporary peaks that vanish before the phase ends. >> >> I wanted to track that information correctly and display it clearly in a way that is easy to understand. >> >> The patch implements per-phase tracking by instrumenting the `TracePhase` stack object (thanks to @rwestrel for this idea). >> >> The nice thing with this technique is that it also allows for quick analysis of a suspected hot spot (eg, the inside of a loop): drop a TracePhase in there with a speaking name, and you can see the allocations inside that phase. >> >> The statistic gives us two new forms of output: >> >> 1) At the moment the compilation memory *peaked*, we now get a detailed breakdown of that peak usage per phase: >> >> >> Arena Usage by Arena Type and compilation phase, at arena usage peak of 58817816: >> Phase Total ra node comp type index reglive regsplit cienv other >> none 1205512 155104 982984 33712 0 0 0 0 0 33712 >> parse 11685376 720016 6578728 1899064 0 0 0 0 1832888 654680 >> optimizer 916584 0 556416 0 0 0 0 0 0 360168 >> escapeAnalysis 1983400 0 1276392 707008 0 0 0 0 0 0 >> connectionGraph 720016 0 0 621832 0 0 0 0 98184 0 >> macroEliminate 196448 0 196448 0 0 0 0 0 0 0 >> iterGVN 327440 0 196368 131072 0 0 0 0 0 0 >> incrementalInline 3992816 0 3043704 62... > > Thomas Stuefe has updated the pull request incrementally with five additional commits since the last revision: > > - feedback ashu > - feedback roberto > - final-statistics-switch > - performance fix > - remove test code Changes requested by rcastanedalo (Reviewer). src/hotspot/share/compiler/compilationMemStatInternals.hpp line 243: > 241: int retrieve_live_node_count() const; > 242: > 243: DEBUG_ONLY(void verify() const;) Unused. I suggest to either call this function from some appropriate point (in debug mode only) or just remove it. src/hotspot/share/runtime/globals.hpp line 1402: > 1400: "Print metaspace statistics upon VM exit.") \ > 1401: \ > 1402: product(bool, PrintCompilerMemoryStatisticsAtExit, false, DIAGNOSTIC, \ Would it be possible to add a test for this new flag, perhaps by extending the existing test logic in `CompileCommandPrintMemStat`? src/hotspot/share/utilities/globalDefinitions.hpp line 371: > 369: > 370: #define PROPERFMT "%zu%s" > 371: #define PROPERFMT_W(width) "%" #width "zu%s" Unused, please remove. src/hotspot/share/utilities/ostream.cpp line 225: > 223: while (count > 0) { > 224: int nw = (count > 8) ? 8 : count; > 225: this->write(tmp, nw); Are these changes essential for the rest of the changeset? If not, I would suggest to leave them to a separate RFE, for simplicity. ------------- PR Review: https://git.openjdk.org/jdk/pull/23530#pullrequestreview-2647251550 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1973265196 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1973277216 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1973266713 PR Review Comment: https://git.openjdk.org/jdk/pull/23530#discussion_r1973272906 From aph at openjdk.org Thu Feb 27 10:19:06 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 27 Feb 2025 10:19:06 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: <_CekdxBJviS_sZCVN62_yFx-cTF4qrIuAnqbIeUmFck=.3a6afffb-8fbe-4809-a4ca-1bc22b52a628@github.com> References: <1yB95sOajuS5ptFI0GQWLepii5JsZ9DOsje-TEFyFYs=.a325ad18-17ed-4e77-b1e3-0bad2cf55c67@github.com> <_CekdxBJviS_sZCVN62_yFx-cTF4qrIuAnq bIeUmFck=.3a6afffb-8fbe-4809-a4ca-1bc22b52a628@github.com> Message-ID: On Tue, 25 Feb 2025 15:58:18 GMT, Ferenc Rakoczi wrote: >> Aha! >> >> >> aph at Andrews-MacBook-Pro ~ % as t.s >> t.s:1:19: error: expected 'sxtx' 'uxtx' or 'lsl' with optional integer in range [0, 4] >> sub x1, x10, x23, sxth #2 >> ^ >> aph at Andrews-MacBook-Pro ~ % as --version >> Apple clang version 16.0.0 (clang-1600.0.26.6) >> Target: arm64-apple-darwin24.3.0 > > OK, so GNU as is more forgiving than Apple as... Did my patch to aarch64-asmtest.py solve the problem? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1973284472 From cnorrbin at openjdk.org Thu Feb 27 10:24:31 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Thu, 27 Feb 2025 10:24:31 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v2] In-Reply-To: References: Message-ID: > Hi everyone, > > The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. > > The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Additionally, I've created `align_up_or_min`, which behaves similarly to the original align_up but handles overflows predictably across all integer types. This new function is used in the locations where overflow checks already exist, providing a safer alternative. Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: reverted gcarguments and updated test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23711/files - new: https://git.openjdk.org/jdk/pull/23711/files/c3bd1f1a..aa8a8054 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=00-01 Stats: 5 lines in 2 files changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23711.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23711/head:pull/23711 PR: https://git.openjdk.org/jdk/pull/23711 From cnorrbin at openjdk.org Thu Feb 27 10:34:00 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Thu, 27 Feb 2025 10:34:00 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v2] In-Reply-To: References: Message-ID: <8LGqYliljxax9uNiYnB0Xj_HWdqukyL4JBeSaOFli7c=.cc93c4ae-cf97-41e1-b5f3-b6212db174e6@github.com> On Thu, 27 Feb 2025 10:24:31 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. >> >> Additionally, I've created `align_up_or_min`, which behaves similarly to the original align_up but handles overflows predictably across all integer types. This new function is used in the locations where overflow checks already exist, providing a safer alternative. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > reverted gcarguments and updated test I have reverted `gcArguments.cpp` so that it now asserts on alignment. For this, a line was added to `TestOptionsWithRanges` to stop `-XX:MinHeapDeltaBytes` from being tested with int_max, since that would overflow and crash. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23711#issuecomment-2687539063 From azafari at openjdk.org Thu Feb 27 10:37:33 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 27 Feb 2025 10:37:33 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: removed UseFlagInPlace test. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20425/files - new: https://git.openjdk.org/jdk/pull/20425/files/5aa4556a..74e4872d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=30-31 Stats: 18 lines in 1 file changed: 0 ins; 18 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From azafari at openjdk.org Thu Feb 27 10:37:33 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 27 Feb 2025 10:37:33 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v31] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Wed, 26 Feb 2025 14:20:26 GMT, Johan Sj?len wrote: >> Also, it is skipped until the corresponding PR get integrated, otherwise it fails. > > Then move the test to that PR and enable it? Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973311931 From dfenacci at openjdk.org Thu Feb 27 11:27:25 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 27 Feb 2025 11:27:25 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v3] In-Reply-To: References: Message-ID: > # Issue > The test `src/hotspot/share/opto/c2compiler.cpp` fails intermittently due to a crash that happens when trying to allocate code cache space for C1 and C2 in `RuntimeStub::new_runtime_stub` and `SingletonBlob::operator new`. > > # Causes > There are a few call paths during the initialization of C1 and C2 that can lead to the code cache allocations in `RuntimeStub::new_runtime_stub` (through `RuntimeStub::operator new`) and `SingletonBlob::operator new` triggering a fatal error if there is no more space. The paths in question are: > 1. `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_uncommon_trap_blob` -> `UncommonTrapBlob::create` -> `new UncommonTrapBlob` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_exception_blob` -> `ExceptionBlob::create` -> `new ExceptionBlob` > > # Solution > Instead of fatally crashing the we can use the `alloc_fail_is_fatal` flag of `RuntimeStub::new_runtime_stub` to avoid crashing in cases 1 and 2 and add a similar flag to `SingletonBlob::operator new` for cases 3 and 4. In the latter case we need to adjust all calls accordingly. > > Note: In [JDK-8326615](https://bugs.openjdk.org/browse/JDK-8326615) it was argued that increasing the minimum code cache size would solve the issue but that wasn't entirely accurate: doing so possibly decreases the chances of a failed allocation in these 4 places but doesn't totally avoid it. > > # Testing > The original failing regression test in `test/hotspot/jtreg/compiler/startup/StartupOutput.java` has been modified to run multiple times with randomized values (within the original failing range) to increase the chances of hitting the fatal assertion. > > Tests: Tier 1-4 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode) Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8347406: return immediately if stub generation fails ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23630/files - new: https://git.openjdk.org/jdk/pull/23630/files/85eb1022..26a6747d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23630&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23630&range=01-02 Stats: 41 lines in 2 files changed: 21 ins; 7 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/23630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23630/head:pull/23630 PR: https://git.openjdk.org/jdk/pull/23630 From dfenacci at openjdk.org Thu Feb 27 11:35:09 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 27 Feb 2025 11:35:09 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v3] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 21:34:16 GMT, Dean Long wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8347406: return immediately if stub generation fails > > src/hotspot/share/gc/shenandoah/c1/shenandoahBarrierSetC1.cpp line 306: > >> 304: _load_reference_barrier_phantom_rt_code_blob != nullptr; >> 305: } >> 306: return _pre_barrier_c1_runtime_code_blob != nullptr && reference_barrier_success; > > Wouldn't it be better to return false immediately after each failure, rather than continuing? Yes, totally. Fixed. > src/hotspot/share/gc/z/c1/zBarrierSetC1.cpp line 543: > >> 541: _store_barrier_on_oop_field_without_healing = >> 542: generate_c1_store_runtime_stub(blob, false /* self_healing */, "store_barrier_on_oop_field_without_healing"); >> 543: return _load_barrier_on_oop_field_preloaded_runtime_stub != nullptr && > > Again, why not return false immediately on first failure? Fixed too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23630#discussion_r1973402325 PR Review Comment: https://git.openjdk.org/jdk/pull/23630#discussion_r1973402472 From dfenacci at openjdk.org Thu Feb 27 12:02:44 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 27 Feb 2025 12:02:44 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v4] In-Reply-To: References: Message-ID: > # Issue > The test `src/hotspot/share/opto/c2compiler.cpp` fails intermittently due to a crash that happens when trying to allocate code cache space for C1 and C2 in `RuntimeStub::new_runtime_stub` and `SingletonBlob::operator new`. > > # Causes > There are a few call paths during the initialization of C1 and C2 that can lead to the code cache allocations in `RuntimeStub::new_runtime_stub` (through `RuntimeStub::operator new`) and `SingletonBlob::operator new` triggering a fatal error if there is no more space. The paths in question are: > 1. `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_uncommon_trap_blob` -> `UncommonTrapBlob::create` -> `new UncommonTrapBlob` > 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_exception_blob` -> `ExceptionBlob::create` -> `new ExceptionBlob` > > # Solution > Instead of fatally crashing the we can use the `alloc_fail_is_fatal` flag of `RuntimeStub::new_runtime_stub` to avoid crashing in cases 1 and 2 and add a similar flag to `SingletonBlob::operator new` for cases 3 and 4. In the latter case we need to adjust all calls accordingly. > > Note: In [JDK-8326615](https://bugs.openjdk.org/browse/JDK-8326615) it was argued that increasing the minimum code cache size would solve the issue but that wasn't entirely accurate: doing so possibly decreases the chances of a failed allocation in these 4 places but doesn't totally avoid it. > > # Testing > The original failing regression test in `test/hotspot/jtreg/compiler/startup/StartupOutput.java` has been modified to run multiple times with randomized values (within the original failing range) to increase the chances of hitting the fatal assertion. > > Tests: Tier 1-4 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode) Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8347406: re-add modified assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23630/files - new: https://git.openjdk.org/jdk/pull/23630/files/26a6747d..906cd756 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23630&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23630&range=02-03 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23630/head:pull/23630 PR: https://git.openjdk.org/jdk/pull/23630 From dfenacci at openjdk.org Thu Feb 27 12:06:02 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 27 Feb 2025 12:06:02 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v4] In-Reply-To: References: Message-ID: On Thu, 20 Feb 2025 21:36:44 GMT, Dean Long wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8347406: re-add modified assert > > src/hotspot/share/opto/output.cpp line 3487: > >> 3485: C->record_failure("CodeCache is full"); >> 3486: } else { >> 3487: C->set_stub_entry_point(rs->entry_point()); > > Is the deleted rs->is_runtime_stub() assert still useful here? A slightly modified one surely is. Inserted it again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23630#discussion_r1973446189 From mli at openjdk.org Thu Feb 27 12:32:37 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 27 Feb 2025 12:32:37 GMT Subject: RFR: 8350855: RISC-V: print offset by assert of patch_offset_in_conditional_branch Message-ID: HI, Can you help to review this trivial patch? We are facing the assert occasionally, but currently there is offset info printed out, it's good to have it, as it's not easy to reproduce it. Thanks ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/23821/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23821&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350855 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23821.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23821/head:pull/23821 PR: https://git.openjdk.org/jdk/pull/23821 From sroy at openjdk.org Thu Feb 27 13:40:51 2025 From: sroy at openjdk.org (Suchismith Roy) Date: Thu, 27 Feb 2025 13:40:51 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28] In-Reply-To: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: > JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) > > Currently acceleration code for GHASH is missing for PPC64. > > The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: use vsplitsb ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20235/files - new: https://git.openjdk.org/jdk/pull/20235/files/474b891b..3bca30f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20235&range=26-27 Stats: 36 lines in 1 file changed: 1 ins; 0 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/20235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20235/head:pull/20235 PR: https://git.openjdk.org/jdk/pull/20235 From jsjolen at openjdk.org Thu Feb 27 13:50:17 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 27 Feb 2025 13:50:17 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 10:37:33 GMT, Afshin Zafari wrote: >> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. >> - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. >> - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. >> - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > removed UseFlagInPlace test. More review comments. src/hotspot/share/nmt/memTracker.hpp line 60: > 58: static bool walk_virtual_memory(VirtualMemoryWalker* walker) { > 59: return VirtualMemoryTracker::Instance::walk_virtual_memory(walker); > 60: } The `MemTracker` API exposes the outer and locking implementation to the rest of Hotspot. These two methods are used by us internally. I think it's better if these methods are deleted and the `VirtualMemoryTracker::Instance` methods are called directly, instead. src/hotspot/share/nmt/nmtTreap.hpp line 388: > 386: head = to_visit.pop(); > 387: if (!f(head)) > 388: return; Style: Always use braces in `if` statements. src/hotspot/share/nmt/nmtTreap.hpp line 416: > 414: if (cmp_from >= 0 && cmp_to < 0) { > 415: if (!f(head)) > 416: return; Style: Always use braces in if statements. src/hotspot/share/nmt/regionsTree.hpp line 30: > 28: #include "nmt/nmtCommon.hpp" > 29: #include "nmt/vmatree.hpp" > 30: #include "nmt/virtualMemoryTracker.hpp" This doesn't seem used. However, you do not include the `nmt/nmtNativeCallStackStorage.hpp` header. src/hotspot/share/nmt/regionsTree.hpp line 49: > 47: using Node = VMATree::TreapNode; > 48: > 49: class NodeHelper { Most of the methods here should be `const` and take `const NodeHelper&` as arguments. src/hotspot/share/nmt/regionsTree.hpp line 62: > 60: inline VMATree::StateType in_state() { return _node->val().in.type(); } > 61: inline VMATree::StateType out_state() { return _node->val().out.type(); } > 62: inline size_t distance_from(NodeHelper& other) { return position() - other.position(); } `assert(position() > other.position()`. src/hotspot/share/nmt/regionsTree.hpp line 82: > 80: ); > 81: } > 82: }; 1. Doesn't need to be inline, move to `cpp` file. 2. No need to cast to `int`, just use the `VMATree::StateType` directly. 3. Should be wrapped in `#ifdef ASSERT` probably, I don't see us shipping this in product builds. src/hotspot/share/nmt/regionsTree.hpp line 90: > 88: return true; > 89: }); > 90: } Move to `cpp` file, wrap in `#ifdef ASSERT`. src/hotspot/share/nmt/virtualMemoryTracker.cpp line 61: > 59: return _tracker->tree() != nullptr; > 60: } > 61: return true; ```c++ void* tracker = os::malloc(sizeof(VirtualMemoryTracker), mtNMT), if (tracker == nullptr) return false; _tracker = new (tracker) VirtualMemoryTracker(level == NMT_detail); src/hotspot/share/nmt/virtualMemoryTracker.cpp line 105: > 103: // " vms-committed: %zu", > 104: // str, NMTUtil::tag_to_name(tag), (long)reserve_delta, (long)commit_delta, reserved, committed); > 105: }; Any plan regarding this? src/hotspot/share/nmt/virtualMemoryTracker.cpp line 319: > 317: } > 318: VirtualMemoryTracker::Instance::add_committed_region(committed_start, committed_size, ncs); > 319: //log_warning(cds)("st start: " INTPTR_FORMAT " size: " SIZE_FORMAT, p2i(committed_start), committed_size); Outdated log src/hotspot/share/nmt/virtualMemoryTracker.hpp line 300: > 298: > 299: public: > 300: CommittedMemoryRegion() : Style: ```c== CommittedMemoryRegion() : VirtualMemoryRegion((address)1, 1), _stack(NativeCallStack::empty_stack()) { } src/hotspot/share/nmt/virtualMemoryTracker.hpp line 322: > 320: bool is_valid() { return base() != (address)1 && size() != 1;} > 321: ReservedMemoryRegion() : > 322: VirtualMemoryRegion((address)1, 1), _stack(NativeCallStack::empty_stack()), _mem_tag(mtNone) { } Style: Space between `is_valid` and constructor, fix initializer list as in `CommittedMemoryRegion`. src/hotspot/share/nmt/virtualMemoryTracker.hpp line 372: > 370: class VirtualMemoryTracker { > 371: private: > 372: RegionsTree *_tree; `class RegionsTree;` shouldn't be needed if you fix the circular include from above. There is no need to have the `private:` specifier. The `_tree` doesn't need to be a pointer after the forward declaration is removed, which in turn simplifies the initialization code. test/hotspot/gtest/nmt/test_nmt_treap.cpp line 30: > 28: #include "runtime/os.hpp" > 29: #include "unittest.hpp" > 30: #include "utilities/linkedlist.hpp" Outdated header inclusion ------------- PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2647773937 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973578352 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973584937 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973585248 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973610404 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973594825 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973594257 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973588473 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973589270 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973604346 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973603631 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973601861 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973599416 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973600519 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973612354 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973613179 From jsjolen at openjdk.org Thu Feb 27 13:50:17 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 27 Feb 2025 13:50:17 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Mon, 24 Feb 2025 17:48:07 GMT, Gerard Ziemski wrote: >> Done. > > 2 questions: > > 1st, I must be misunderstanding something here. Johan asked to change the API from: > > `visit_committed_regions(ReservedMemoryRegion& committed_rgn)` > > to > > `visit_committed_regions(position start, size size)` > > but I still see the old way. > > 2nd, why are we asking for this change? We want to remove `ReservedMemoryRegion` in a follow up PR to this one. Another step is to remove the `CommittedMemoryRegion` class as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1973584029 From coleenp at openjdk.org Thu Feb 27 14:28:05 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 27 Feb 2025 14:28:05 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 16:29:25 GMT, Fredrik Bredberg wrote: > I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. > > This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. > > In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. > > The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. > > You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. > > The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. > > Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). > > Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. > > However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fact that c2 no longer has to check b... This looks really good - I have some small change and improvement requests. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 418: > 416: // have released the lock. > 417: // Refer to the comments in synchronizer.cpp for how we might encode extra > 418: // state in _succ so we can avoid fetching entry_list. I there is no comment in synchronizer about this (that I can find) and whether or not this is a good idea, can you remove this line with this change? src/hotspot/share/runtime/objectMonitor.cpp line 701: > 699: void ObjectMonitor::add_to_entry_list(JavaThread* current, ObjectWaiter* node) { > 700: node->_prev = nullptr; > 701: node->TState = ObjectWaiter::TS_ENTER; I think you should do this in a future cleanup. The ObjectWaiter's constructor should initialize these fields to TS_ENTER or TS_WAIT when it's created and make prev, next null (or 0xBAD?). And fix the constructor to have an initialization list instead. src/hotspot/share/runtime/objectMonitor.cpp line 735: > 733: assert(!has_successor(current), "invariant"); > 734: assert(has_owner(current), "invariant"); > 735: return true; I wonder for a future RFE we can move these asserts into TryLock. src/hotspot/share/runtime/objectMonitor.cpp line 1285: > 1283: // By convention we unlink a contending thread from _entry_list immediately > 1284: // after the thread acquires the lock in ::enter(). Equally, we could defer > 1285: // unlinking the thread until ::exit()-time. Since you're here, remove these two lines 1222-1223. I really don't think pointing out an alternate implementation that we did not choose is helpful to understanding this code. src/hotspot/share/runtime/objectMonitor.hpp line 46: > 44: class ObjectWaiter : public CHeapObj { > 45: public: > 46: enum TStates : uint8_t { TS_UNDEF, TS_READY, TS_RUN, TS_WAIT, TS_ENTER }; TS_READY looks unused. src/hotspot/share/runtime/objectMonitor.hpp line 79: > 77: void set_bad_pointers() { > 78: #ifdef ASSERT > 79: // Diagnostic hygiene ... hygiene seems like the wrong word here. Can you remove this comment? src/hotspot/share/runtime/synchronizer.cpp line 369: > 367: // We have one or more waiters. Since this is an inflated monitor > 368: // that we own, we can transfer one or more threads from the waitset > 369: // to the entry_list here and now, avoiding the slow-path. Not related to this change but I found that this quick_notify isn't quicker. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23421#pullrequestreview-2647862248 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1973630782 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1973654464 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1973664035 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1973670396 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1973678657 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1973632087 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1973634214 From coleenp at openjdk.org Thu Feb 27 14:28:07 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 27 Feb 2025 14:28:07 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 05:42:12 GMT, David Holmes wrote: >> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. >> >> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. >> >> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. >> >> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. >> >> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. >> >> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. >> >> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). >> >> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. >> >> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac... > > src/hotspot/share/runtime/objectMonitor.cpp line 166: > >> 164: // its next pointer, and have its prev pointer set to null. Thus >> 165: // pushing six threads A-F (in that order) onto entry_list, will >> 166: // form a singly-linked list, see 1) below. > > Suggestion: have diagram 1 immediately follow this text so the reader doesn't have to jump down. I like this suggestion. I like these comments. > src/hotspot/share/runtime/objectMonitor.cpp line 718: > >> 716: // if we added current to _entry_list. Once on _entry_list, current >> 717: // stays on-queue until it acquires the lock. >> 718: bool ObjectMonitor::try_lock_or_add_to_entry_list(JavaThread* current, ObjectWaiter* node) { > > Nit: the name suggests we do the try_lock first, when we don't. If we reverse the name we should also reverse the true/false return so that true relates to the first part of the name. See what others think. How about add_to_entry_list with a boolean parameter that tries the lock if it fails, and only have one of these functions? Although the return true if you get the lock makes it weird. bool add_to_entry_list(JavaThread* current, ObjectWaiter* node, bool or_lock) { return true if locked, false otherwise; } Maybe that makes sense. > src/hotspot/share/runtime/objectMonitor.cpp line 719: > >> 717: // stays on-queue until it acquires the lock. >> 718: bool ObjectMonitor::try_lock_or_add_to_entry_list(JavaThread* current, ObjectWaiter* node) { >> 719: node->_prev = nullptr; > > Shouldn't this already be the case? I think for the vthread case, it isn't yet(?). Maybe motivation to fix the ObjectWaiter constructor with this patch? > src/hotspot/share/runtime/objectMonitor.cpp line 2018: > >> 2016: // that in prepend-mode we invert the order of the waiters. Let's say that the >> 2017: // waitset is "ABCD" and the entry_list is "XYZ". After a notifyAll() in prepend >> 2018: // mode the waitset will be empty and the entry_list will be "DCBAXYZ". > > We don't support different ordering modes any more so we always "prepend" such that waiters are added to the entry_list in the reverse order of waiting. So given waitList -> A -> B -> C -> D, and _entry_list -> x -> y -> z we will get _entry_list -> D -> C -> B -> A -> X -> Y -> Z One of the benefits of this work is to read, understand and clean up misleading and out of date comments in this code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1973636957 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1973657207 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1973681891 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1973684370 From mdoerr at openjdk.org Thu Feb 27 14:32:59 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Feb 2025 14:32:59 GMT Subject: RFR: 8350716: [s390] intrinsify Thread.currentThread() In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 04:14:37 GMT, Amit Kumar wrote: > s390x port for [JDK-8278793](https://bugs.openjdk.org/browse/JDK-8278793) LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23791#pullrequestreview-2647976494 From duke at openjdk.org Thu Feb 27 15:21:26 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Thu, 27 Feb 2025 15:21:26 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups Message-ID: This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. I tested it with: java -Xlog:os+container=trace -version on: `Red Hat Enterprise Linux 8 (cgroups v1 only)`: _No change in behaviour_ `Fedora 41 (cgroups v2)`: _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 @@ -1,7 +1,12 @@ [trace][os,container] OSContainer::init: Initializing Container Support -[debug][os,container] Detected optional pids controller entry in /proc/cgroups -[debug][os,container] controller cpuset is not enabled - ] +[debug][os,container] v2 controller cpuset is enabled and relevant +[debug][os,container] v2 controller cpu is enabled and required +[debug][os,container] v2 controller io is enabled but not relevant +[debug][os,container] v2 controller memory is enabled and required +[debug][os,container] v2 controller hugetlb is enabled but not relevant +[debug][os,container] v2 controller pids is enabled and relevant +[debug][os,container] v2 controller rdma is enabled but not relevant +[debug][os,container] v2 controller misc is enabled but not relevant [debug][os,container] Detected cgroups v2 unified hierarchy [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max `Fedora 41 (custom kernel with cgroups v1 disabled)`: _Fixes `cgroups v2` detection:_ --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 @@ -1,7 +1,63 @@ [trace][os,container] OSContainer::init: Initializing Container Support -[debug][os,container] Detected optional pids controller entry in /proc/cgroups -[debug][os,container] controller cpuset is not enabled - ] -[debug][os,container] controller memory is not enabled - ] -[debug][os,container] One or more required controllers disabled at kernel level. +[debug][os,container] v2 controller cpuset is enabled and relevant +[debug][os,container] v2 controller cpu is enabled and required +[debug][os,container] v2 controller io is enabled but not relevant +[debug][os,container] v2 controller memory is enabled and required +[debug][os,container] v2 controller hugetlb is enabled but not relevant +[debug][os,container] v2 controller pids is enabled and relevant +[debug][os,container] v2 controller rdma is enabled but not relevant +[debug][os,container] v2 controller misc is enabled but not relevant +[debug][os,container] Detected cgroups v2 unified hierarchy +[trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope +[trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope/memory.max +[trace][os,container] Memory Limit is: -1 +[trace][os,container] Memory Limit is: Unlimited +[debug][os,container] container memory limit unlimited: -1, using host value 4094947328 +[trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-1000.slice/memory.max +[trace][os,container] Memory Limit is: -1 +[trace][os,container] Memory Limit is: Unlimited +[debug][os,container] container memory limit unlimited: -1, using host value 4094947328 +[trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/memory.max +[trace][os,container] Memory Limit is: -1 +[trace][os,container] Memory Limit is: Unlimited +[debug][os,container] container memory limit unlimited: -1, using host value 4094947328 +[trace][os,container] Path to /memory.max is /sys/fs/cgroup/memory.max +[debug][os,container] Open of file /sys/fs/cgroup/memory.max failed, No such file or directory +[trace][os,container] Memory Limit failed: -2 +[trace][os,container] Memory Limit is: -2 +[debug][os,container] container memory limit failed: -2, using host value 4094947328 +[trace][os,container] No lower limit found for memory in hierarchy /sys/fs/cgroup, adjusting to original path /user.slice/user-1000.slice/session-95.scope +[trace][os,container] Adjusting controller path for cpu: /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope +[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope/cpu.max +[trace][os,container] CPU Quota is: -1 +[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope/cpu.max +[trace][os,container] CPU Period is: 100000 +[trace][os,container] OSContainer::active_processor_count: 2 +[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/cpu.max +[trace][os,container] CPU Quota is: -1 +[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/cpu.max +[trace][os,container] CPU Period is: 100000 +[trace][os,container] OSContainer::active_processor_count: 2 +[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/cpu.max +[trace][os,container] CPU Quota is: -1 +[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/cpu.max +[trace][os,container] CPU Period is: 100000 +[trace][os,container] OSContainer::active_processor_count: 2 +[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/cpu.max +[debug][os,container] Open of file /sys/fs/cgroup/cpu.max failed, No such file or directory +[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/cpu.max +[debug][os,container] Open of file /sys/fs/cgroup/cpu.max failed, No such file or directory +[trace][os,container] CPU Period failed: -2 +[trace][os,container] OSContainer::active_processor_count: 2 +[trace][os,container] No lower limit found for cpu in hierarchy /sys/fs/cgroup, adjusting to original path /user.slice/user-1000.slice/session-95.scope +[trace][os,container] total physical memory: 4094947328 +[trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope/memory.max +[trace][os,container] Memory Limit is: -1 +[trace][os,container] Memory Limit is: Unlimited +[debug][os,container] container memory limit unlimited: -1, using host value 4094947328 +[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope/cpu.max +[trace][os,container] CPU Quota is: -1 +[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/user.slice/user-1000.slice/session-95.scope/cpu.max +[trace][os,container] CPU Period is: 100000 +[trace][os,container] OSContainer::active_processor_count: 2 +[debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present `Alpine Linux v3.21 (unified, aka cgroups v2 only)`: _Fixes `cgroups v2` detection:_ --- tt-old-alpine-unified.txt 2025-02-26 15:38:34.575898350 -0500 +++ tt-new-alpine-unified.txt 2025-02-26 15:38:36.156905658 -0500 @@ -1,7 +1,21 @@ [trace][os,container] OSContainer::init: Initializing Container Support -[debug][os,container] Detected optional pids controller entry in /proc/cgroups -[debug][os,container] controller cpuset is not enabled - ] -[debug][os,container] controller memory is not enabled - ] -[debug][os,container] One or more required controllers disabled at kernel level. +[debug][os,container] v2 controller cpuset is enabled and relevant +[debug][os,container] v2 controller cpu is enabled and required +[debug][os,container] v2 controller io is enabled but not relevant +[debug][os,container] v2 controller memory is enabled and required +[debug][os,container] v2 controller hugetlb is enabled but not relevant +[debug][os,container] v2 controller pids is enabled and relevant +[debug][os,container] Detected cgroups v2 unified hierarchy +[trace][os,container] total physical memory: 2074931200 +[trace][os,container] Path to /memory.max is /sys/fs/cgroup/memory.max +[debug][os,container] Open of file /sys/fs/cgroup/memory.max failed, No such file or directory +[trace][os,container] Memory Limit failed: -2 +[trace][os,container] Memory Limit is: -2 +[debug][os,container] container memory limit failed: -2, using host value 2074931200 +[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/cpu.max +[debug][os,container] Open of file /sys/fs/cgroup/cpu.max failed, No such file or directory +[trace][os,container] Path to /cpu.max is /sys/fs/cgroup/cpu.max +[debug][os,container] Open of file /sys/fs/cgroup/cpu.max failed, No such file or directory +[trace][os,container] CPU Period failed: -2 +[trace][os,container] OSContainer::active_processor_count: 2 +[debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present `Alpine Linux v3.21 (hybrid)`: _No change in behaviour._ `Alpine Linux v3.21 (legacy)`: _No change in behaviour._ ------------- Commit messages: - 8349988: Change cgroup version detection logic to not depend on /proc/cgroups Changes: https://git.openjdk.org/jdk/pull/23811/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23811&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349988 Stats: 299 lines in 6 files changed: 212 ins; 23 del; 64 mod Patch: https://git.openjdk.org/jdk/pull/23811.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23811/head:pull/23811 PR: https://git.openjdk.org/jdk/pull/23811 From sgehwolf at openjdk.org Thu Feb 27 15:21:26 2025 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 27 Feb 2025 15:21:26 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups In-Reply-To: References: Message-ID: <5Nq98ua0IzNrKDF4GnestBK6xY145zvKBmEFBKBgGQc=.80a50140-449b-4ec3-b7d1-ddbd1397a551@github.com> On Wed, 26 Feb 2025 21:03:58 GMT, Thomas Fitzsimmons wrote: > This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. > > I tested it with: > > > java -Xlog:os+container=trace -version > > on: > > `Red Hat Enterprise Linux 8 (cgroups v1 only)`: > _No change in behaviour_ > > `Fedora 41 (cgroups v2)`: > _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ > > --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 > +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 > @@ -1,7 +1,12 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 controller cpu is enabled and required > +[debug][os,container] v2 controller io is enabled but not relevant > +[debug][os,container] v2 controller memory is enabled and required > +[debug][os,container] v2 controller hugetlb is enabled but not relevant > +[debug][os,container] v2 controller pids is enabled and relevant > +[debug][os,container] v2 controller rdma is enabled but not relevant > +[debug][os,container] v2 controller misc is enabled but not relevant > [debug][os,container] Detected cgroups v2 unified hierarchy > [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope > [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max > > > `Fedora 41 (custom kernel with cgroups v1 disabled)`: > _Fixes `cgroups v2` detection:_ > > --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 > +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 > @@ -1,7 +1,63 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > -[debug][os,container] controller memory is not enabled > - ] > -[debug][os,container] One or more required controllers disabled at kernel level. > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 contro... @fitzsim Please use `/issue add JDK-8347811` since this PR is addressing them both (JDK-8349988 and JDK-8347811). That way both will get resolved when this PR integrates. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23811#issuecomment-2688118558 From duke at openjdk.org Thu Feb 27 15:21:26 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Thu, 27 Feb 2025 15:21:26 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 21:03:58 GMT, Thomas Fitzsimmons wrote: > This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. > > I tested it with: > > > java -Xlog:os+container=trace -version > > on: > > `Red Hat Enterprise Linux 8 (cgroups v1 only)`: > _No change in behaviour_ > > `Fedora 41 (cgroups v2)`: > _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ > > --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 > +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 > @@ -1,7 +1,12 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 controller cpu is enabled and required > +[debug][os,container] v2 controller io is enabled but not relevant > +[debug][os,container] v2 controller memory is enabled and required > +[debug][os,container] v2 controller hugetlb is enabled but not relevant > +[debug][os,container] v2 controller pids is enabled and relevant > +[debug][os,container] v2 controller rdma is enabled but not relevant > +[debug][os,container] v2 controller misc is enabled but not relevant > [debug][os,container] Detected cgroups v2 unified hierarchy > [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope > [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max > > > `Fedora 41 (custom kernel with cgroups v1 disabled)`: > _Fixes `cgroups v2` detection:_ > > --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 > +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 > @@ -1,7 +1,63 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > -[debug][os,container] controller memory is not enabled > - ] > -[debug][os,container] One or more required controllers disabled at kernel level. > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 contro... The actual changes are easier to see when whitespace is ignored: https://github.com/openjdk/jdk/pull/23811/commits/39a6463c0bd0f8f94e0ca6382ea1c87d2935af9d?w=1 I fixed an existing assert message typo that I noticed while working on the patch, `hierarchy mismatch for cpuacc[t]`. Strictly speaking it is not related to either bug report, but I figured it did not warrant a bug report of its own. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23811#issuecomment-2688237008 PR Comment: https://git.openjdk.org/jdk/pull/23811#issuecomment-2688251628 From fbredberg at openjdk.org Thu Feb 27 15:54:28 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 27 Feb 2025 15:54:28 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: > I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. > > This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. > > In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. > > The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. > > You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. > > The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. > > Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). > > Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. > > However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fact that c2 no longer has to check b... Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: Update after review by David and Coleen. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23421/files - new: https://git.openjdk.org/jdk/pull/23421/files/e1d4fac6..283c2431 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23421&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23421&range=00-01 Stats: 124 lines in 5 files changed: 28 ins; 36 del; 60 mod Patch: https://git.openjdk.org/jdk/pull/23421.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23421/head:pull/23421 PR: https://git.openjdk.org/jdk/pull/23421 From fbredberg at openjdk.org Thu Feb 27 16:00:13 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 27 Feb 2025 16:00:13 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 05:42:25 GMT, David Holmes wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after review by David and Coleen. > > src/hotspot/share/runtime/objectMonitor.cpp line 172: > >> 170: // from the entry_list head. While walking the list we also assign >> 171: // the prev pointers of each thread, essentially forming a doubly >> 172: // linked list, see 2) below. > > Suggestion: have diagram 2 immediately follow this text so the reader doesn't have to jump down. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1973880640 From fbredberg at openjdk.org Thu Feb 27 16:00:14 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 27 Feb 2025 16:00:14 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 14:09:45 GMT, Coleen Phillimore wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after review by David and Coleen. > > src/hotspot/share/runtime/objectMonitor.cpp line 701: > >> 699: void ObjectMonitor::add_to_entry_list(JavaThread* current, ObjectWaiter* node) { >> 700: node->_prev = nullptr; >> 701: node->TState = ObjectWaiter::TS_ENTER; > > I think you should do this in a future cleanup. The ObjectWaiter's constructor should initialize these fields to TS_ENTER or TS_WAIT when it's created and make prev, next null (or 0xBAD?). And fix the constructor to have an initialization list instead. Sounds like a plan. > src/hotspot/share/runtime/synchronizer.cpp line 369: > >> 367: // We have one or more waiters. Since this is an inflated monitor >> 368: // that we own, we can transfer one or more threads from the waitset >> 369: // to the entry_list here and now, avoiding the slow-path. > > Not related to this change but I found that this quick_notify isn't quicker. Let's make quick_notify quicker (in another RFE). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1973883699 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1973878764 From galder at openjdk.org Thu Feb 27 16:41:13 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 27 Feb 2025 16:41:13 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 7 Feb 2025 12:39:24 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 44 additional commits since the last revision: > > - Merge branch 'master' into topic.intrinsify-max-min-long > - Fix typo > - Renaming methods and variables and add docu on algorithms > - Fix copyright years > - Make sure it runs with cpus with either avx512 or asimd > - Test can only run with 256 bit registers or bigger > > * Remove platform dependant check > and use platform independent configuration instead. > - Fix license header > - Tests should also run on aarch64 asimd=true envs > - Added comment around the assertions > - Adjust min/max identity IR test expectations after changes > - ... and 34 more: https://git.openjdk.org/jdk/compare/92e82467...a190ae68 Also, I've started a [discussion on jmh-dev](https://mail.openjdk.org/pipermail/jmh-dev/2025-February/004094.html) to see if there's a way to minimise pollution of `Math.min(II)` compilation. As a follow to https://github.com/openjdk/jdk/pull/20098#issuecomment-2684701935 I looked at where the other `Math.min(II)` calls are coming from, and a big chunk seem related to the JMH infrastructure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2688510211 From galder at openjdk.org Thu Feb 27 16:38:04 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 27 Feb 2025 16:38:04 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12] In-Reply-To: <63F-0aHgMthexL0b2DFmkW8_QrJeo8OOlCaIyZApfpY=.4744070d-9d56-4031-8684-be14cf66d1e5@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <63F-0aHgMthexL0b2DFmkW8_QrJeo8OOlCaIyZApfpY=.4744070d-9d56-4031-8684-be14cf66d1e5@github.com> Message-ID: On Thu, 27 Feb 2025 06:54:30 GMT, Emanuel Peter wrote: > Detect "extreme" probability scalar cmove, and replace them with branching code. This should take care of all regressions here. This one has high priority, as it fixes the regression caused by this patch here. But it would also help to improve performance for the Integer.min/max cases, which have the same issue. +1 and the rest of suggestions. Shall I create a JDK bug for this? > Additional performance improvement: make SuperWord recognize more cases as profitble (see Regression 1). Optional. > Additional performance improvement: extend backend capabilities for vectorization (see Regression 2 + 3). Optional. Do we need JDK bug(s) for these? If so, how many? 1 or 2? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2688502397 From jrose at openjdk.org Thu Feb 27 16:45:20 2025 From: jrose at openjdk.org (John R Rose) Date: Thu, 27 Feb 2025 16:45:20 GMT Subject: RFR: 8348426: Generate binary file for -XX:AOTMode=record -XX:AOTConfiguration=file [v8] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 01:11:25 GMT, Ioi Lam wrote: >> Currently, with `java -XX:AOTMode=record -XX:AOTConfiguration=file ...`, a text file is written. The file contains the names of loaded classes, indices of resolved constant pools entries, etc, that are easily represented in text. >> >> With the upcoming 2nd JEP of the Leyden project, [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) (Ahead-of-Time Method Profiling), the AOT config file needs to record complex data structures that are difficult to represent in text (we would need code for serializing hierarchical data structures to/from text). Also, a next step after [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147) would be to support hidden classes that have no predictable names. Representing such classes with textual names would become another challenge. >> >> To prepare for [JDK-8325147](https://bugs.openjdk.org/browse/JDK-8325147), this PR writes the AOT configuration file in a **binary format** (essentially the same format as a CDS archive file). This allows arbitrary data associated with the cached classes to be processed and stored using the existing `MetaspaceClosure` API (which can recursively copy C++ objects). Such a change in the file format is allowed by [JEP 483](https://openjdk.org/jeps/483): >> >>> the format of the configuration and cache files is not specified and is subject to change without notice. >> >> **Notes for reviewers:** >> >> - Although the new config file format is essentially the same as a CDS "static" archive, for sanity, we use a different magic number so that the config file cannot be accidentally used as a CDS archive. See new tests inside AOTFlags.java. >> - After this PR, the CDS "static" archive can be dumped in three modes: "classic", "preimage", and "final". See new comments in cdsConfig.hpp. >> - The main starting point of this PR is `CDSConfig::check_aot_flags()` - it checks the existence of `-XX:AOTConfiguration` and `-XX:AOTMode` to configure the JVM to dump the CDS "preimage" or "final" archives as necessary. >> - Most of the other changes are checks for `CDSConfig::is_dumping_preimage_static_archive()` and `CDSConfig::is_dumping_final_static_archive()` to handle subtlle differences between the different dumping modes. >> - I also updated the UL messages to use the new JEP 483 terminology ("AOT cache", "AOT configuration file", etc) when JEP 483 options are specified. >> >> **Misc Note** >> - The changes in [CDS.java and RunTests.gmk](https://github.com/iklam/jdk/commit/0e77a35c25a968c7d931931bc108ccb... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - all tests in runtime/cds/appcds/aotClassLinking should be excluded for hotspot_appcds_dynamic testing > - @ashu-mehra comment - simplified _archived_cpp_vtptrs; also fixed old comments near by > - Merge branch 'master' into 8348426-binary-aot-config-file > - Merge branch 'master' into 8348426-binary-aot-config-file > - @ashu-mehra comments > - @calvinccheung comments > - Improved JTREG_AOT_JDK=true so we do not need to add test code into the JDK itself > - Improve error message when AOTMode=create has an incompatible classpath > - Fixed test cases @vnkozlov > - Update "make test JTREG_AOT_JDK=true ..." to work with binary AOT configuration > - ... and 5 more: https://git.openjdk.org/jdk/compare/990d40e9...1ec67c11 Terminology suggestion: "preimage" has a math meaning which is a little bit distinct from the usage here. A good software example of a math preimage would be a data structure you are going to copy into another container, with some kind of transform. Meanwhile, "aotconfig" is a very specific term (also visible to the user) which means exactly the contents of the file being change. I suggest changing "preimage" in this patch to "aotconfig", more or less uniformly. Then it will be really clear that we are talking about the `foo.aotconfig` file in the JVM source code. Indeed, the word "preimage" is correct (in the math sense) at present when we copy metadata loaded into the training run, into the atoconfig file, all the way through the assembly phase to the AOT cache. But that is kind of accidental; the assembly phase might choose to reload classes from scratch some day. And we certainly rebuild AOT code (SCC) from scratch now. The necessary role of the aotconfig is to provide configuration data that drives AOT asset creation; it is sort of accidental when the aotconfig provides literal AOT assets to copy through. I tipped over into this comment when I realized that "aotconfig file" is the same as "preinage file". ------------- PR Comment: https://git.openjdk.org/jdk/pull/23484#issuecomment-2688519203 From duke at openjdk.org Thu Feb 27 16:48:07 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Thu, 27 Feb 2025 16:48:07 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v5] In-Reply-To: References: <1yB95sOajuS5ptFI0GQWLepii5JsZ9DOsje-TEFyFYs=.a325ad18-17ed-4e77-b1e3-0bad2cf55c67@github.com> <_CekdxBJviS_sZCVN62_yFx-cTF4qrIuAnq bIeUmFck=.3a6afffb-8fbe-4809-a4ca-1bc22b52a628@github.com> Message-ID: On Thu, 27 Feb 2025 10:15:48 GMT, Andrew Haley wrote: >> OK, so GNU as is more forgiving than Apple as... > > Did my patch to aarch64-asmtest.py solve the problem? I haven't tried, I just used GNU as. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23300#discussion_r1973970358 From coleenp at openjdk.org Thu Feb 27 17:16:04 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 27 Feb 2025 17:16:04 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 15:54:28 GMT, Fredrik Bredberg wrote: >> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. >> >> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. >> >> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. >> >> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. >> >> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. >> >> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. >> >> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). >> >> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. >> >> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac... > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update after review by David and Coleen. This change looks great. Thank you! src/hotspot/share/runtime/objectMonitor.cpp line 219: > 217: // entry_list_tail ----------^ > 218: // > 219: // * The monitor itself protects all of the operations on the This is a nice comment and really helps understand the algorithm. src/hotspot/share/runtime/objectMonitor.cpp line 948: > 946: current->_ParkEvent->reset(); > 947: > 948: if (try_lock_or_add_to_entry_list(current, &node)) { try_lock_or_add_to_entry_list() name makes sense in this context. if (add_to_entry_list(current, &node, /*try_lock*/true)) { return; // We got the lock } Makes less sense. I propose leaving the names and the functions for now. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23421#pullrequestreview-2648493876 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974006126 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974014351 From coleenp at openjdk.org Thu Feb 27 17:16:05 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 27 Feb 2025 17:16:05 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 14:22:02 GMT, Coleen Phillimore wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after review by David and Coleen. > > src/hotspot/share/runtime/objectMonitor.hpp line 46: > >> 44: class ObjectWaiter : public CHeapObj { >> 45: public: >> 46: enum TStates : uint8_t { TS_UNDEF, TS_READY, TS_RUN, TS_WAIT, TS_ENTER }; > > TS_READY looks unused. Edit: this could be a trivial further PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974015687 From pchilanomate at openjdk.org Thu Feb 27 17:49:07 2025 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 27 Feb 2025 17:49:07 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v3] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Wed, 19 Feb 2025 00:37:14 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > Stricter assertion on ppc64 Marked as reviewed by pchilanomate (Reviewer). src/hotspot/share/runtime/deoptimization.cpp line 645: > 643: methodHandle method(current, deopt_sender.interpreter_frame_method()); > 644: Bytecode_invoke cur(method, deopt_sender.interpreter_frame_bci()); > 645: if (!cur.is_invokedynamic() && MethodHandles::has_member_arg(cur.klass(), cur.name())) { I was confused with this new condition but I see is the same we have in `vframeArray::unpack_to_stack()`. ------------- PR Review: https://git.openjdk.org/jdk/pull/23557#pullrequestreview-2648596315 PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1974069132 From sgehwolf at openjdk.org Thu Feb 27 18:00:01 2025 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 27 Feb 2025 18:00:01 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 21:03:58 GMT, Thomas Fitzsimmons wrote: > This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. > > I tested it with: > > > java -Xlog:os+container=trace -version > > on: > > `Red Hat Enterprise Linux 8 (cgroups v1 only)`: > _No change in behaviour_ > > `Fedora 41 (cgroups v2)`: > _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ > > --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 > +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 > @@ -1,7 +1,12 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 controller cpu is enabled and required > +[debug][os,container] v2 controller io is enabled but not relevant > +[debug][os,container] v2 controller memory is enabled and required > +[debug][os,container] v2 controller hugetlb is enabled but not relevant > +[debug][os,container] v2 controller pids is enabled and relevant > +[debug][os,container] v2 controller rdma is enabled but not relevant > +[debug][os,container] v2 controller misc is enabled but not relevant > [debug][os,container] Detected cgroups v2 unified hierarchy > [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope > [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max > > > `Fedora 41 (custom kernel with cgroups v1 disabled)`: > _Fixes `cgroups v2` detection:_ > > --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 > +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 > @@ -1,7 +1,63 @@ > [trace][os,container] OSContainer::init: Initializing Container Support > -[debug][os,container] Detected optional pids controller entry in /proc/cgroups > -[debug][os,container] controller cpuset is not enabled > - ] > -[debug][os,container] controller memory is not enabled > - ] > -[debug][os,container] One or more required controllers disabled at kernel level. > +[debug][os,container] v2 controller cpuset is enabled and relevant > +[debug][os,container] v2 contro... Thanks for this. It's getting there... src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 42: > 40: // Inlined from for portability. > 41: #define CGROUP2_SUPER_MAGIC 0x63677270 > 42: We may want to surround this with: #ifndef CGROUP2_SUPER_MAGIC ... #endif src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 81: > 79: bool cgroups_v2_enabled = false; > 80: > 81: if (statfs(sys_fs_cgroup, &fsstat) != -1) { This probably deserves a comment: // Assume cgroups v2 iff /sys/fs/cgroup has the cgroup v2 file system magic. src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 263: > 261: char buf[MAXPATHLEN+1]; > 262: char *p; > 263: bool is_cgroupsV2 = true; For all intents and purposes we can remove `is_cgroupsV2` here and use `cgroups_v2_enabled` instead. src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 285: > 283: if ((p = fgets(buf, MAXPATHLEN, controllers)) != nullptr) { > 284: char* controller = nullptr; > 285: char* buf_ptr = buf; Suggestion: char* buf_ptr = p; src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 287: > 285: char* buf_ptr = buf; > 286: int i; > 287: while ((controller = strsep(&buf_ptr, " \n\t\r\f\v")) != nullptr) { Consider defining the separators as `#define IS_SPACE_CHARS " \n\t\r\f\v"` or some such. src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 288: > 286: int i; > 287: while ((controller = strsep(&buf_ptr, " \n\t\r\f\v")) != nullptr) { > 288: // Skip empty string due to line ending in delimiter, '\n'. Suggestion: // Skip empty controllers. Be lean about the cgroups.controllers file, // though we probably don't have to be. src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 299: > 297: } else { > 298: log_debug(os, container)("v2 controller %s is enabled but not relevant", controller); > 299: } Do we really need this verbose logging? If you really think we need it, then please bump it to `trace` level. We'd be bailing out anyway if we are missing a required controller with a log. src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 321: > 319: } else { > 320: /* > 321: * cgroups v2 is not enabled. Read /proc/cgroups; for cgroups v1 hierarchy (hybrid or Suggestion: * The /sys/fs/cgroup filesystem magic hint suggests we have cg v1. Read /proc/cgroups; for cgroups v1 hierarchy (hybrid or src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 339: > 337: cg_infos[MEMORY_IDX]._enabled = (enabled == 1); > 338: } else if (strcmp(name, "cpuset") == 0) { > 339: log_debug(os, container)("Detected optional cpuset controller entry in %s", controllers_file); In https://bugs.openjdk.org/browse/JDK-8347129 we decided to keep the `cpuset` optionality alone for cg v1. I'd prefer if we kept it that way as it's becoming increasingly difficult to find those systems (or change them). Thus, hard to know if this would break something. src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 360: > 358: for (int i = 0; i < CG_INFO_LENGTH; i++) { > 359: // pids and cpuset controllers are optional. All other controllers are required > 360: if (i != PIDS_IDX && i != CPUSET_IDX) { Same comment for `cpuset` controller. Keep it required for cg v1. src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 361: > 359: // pids and cpuset controllers are optional. All other controllers are required > 360: if (i != PIDS_IDX && i != CPUSET_IDX) { > 361: is_cgroupsV2 = is_cgroupsV2 && cg_infos[i]._hierarchy_id == 0; Fundamentally, we are changing the "hint" as to what constitutes cg v2. So this line needs to be removed. We've already determined at this point that we have cgroup v2 (via the magic check) and we need to use that here. https://github.com/jerboaa/jdk/commit/9958173dc03b66ae96227fb3579bc053cb911f06 would do that and fix a test-consistency issue. ------------- PR Review: https://git.openjdk.org/jdk/pull/23811#pullrequestreview-2647973128 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973695585 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1974085054 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973890517 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973712472 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973705509 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1974072923 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1974083186 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973722049 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973728648 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973730310 PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973877201 From ayang at openjdk.org Thu Feb 27 18:34:14 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 27 Feb 2025 18:34:14 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v2] In-Reply-To: References: Message-ID: <3zmj-DeeRyPMHc32YnvfqACN0xJxLQ6jZZ7sd-Baa3w=.672912f6-e4a3-4679-b8a3-b7f6ad51589d@github.com> On Tue, 25 Feb 2025 15:13:43 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * remove unnecessarily added logging src/hotspot/share/gc/g1/g1BarrierSet.hpp line 54: > 52: // them, keeping the write barrier simple. > 53: // > 54: // The refinement threads mark cards in the the current collection set specially on the "the the" typo. src/hotspot/share/gc/g1/g1CardTable.inline.hpp line 47: > 45: > 46: // Returns bits from a where mask is 0, and bits from b where mask is 1. > 47: inline size_t blend(size_t a, size_t b, size_t mask) { Can you provide some input/output examples in the doc? src/hotspot/share/gc/g1/g1CardTableClaimTable.cpp line 45: > 43: } > 44: > 45: void G1CardTableClaimTable::initialize(size_t max_reserved_regions) { Should the arg be `uint`? src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 280: > 278: assert_state(State::SweepRT); > 279: > 280: set_state_start_time(); This method is called in a loop; would that skew the state-starting time? src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 344: > 342: size_t _num_clean; > 343: size_t _num_dirty; > 344: size_t _num_to_cset; Seem never read. src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 349: > 347: > 348: bool do_heap_region(G1HeapRegion* r) override { > 349: if (!r->is_free()) { I am a bit lost on this closure; the intention seems to set unclaimed to all non-free regions, why can't this be done in one go, instead of first setting all regions to claimed (`reset_all_claims_to_claimed`), then set non-free ones unclaimed? src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 116: > 114: > 115: // Current heap snapshot. > 116: G1CardTableClaimTable* _sweep_state; Since this is a table, I wonder if we can name it "x_table" instead of "x_state". src/hotspot/share/gc/g1/g1RemSet.cpp line 147: > 145: if (_contains[region]) { > 146: return; > 147: } Indentation seems broken. src/hotspot/share/gc/g1/g1RemSet.cpp line 830: > 828: size_t const start_idx = region_card_base_idx + claim.value(); > 829: > 830: size_t* card_cur_card = (size_t*)card_table->byte_for_index(start_idx); This var name should end with "_word", instead of "_card". src/hotspot/share/gc/g1/g1RemSet.cpp line 1252: > 1250: G1ConcurrentRefineWorkState::snapshot_heap_into(&constructed); > 1251: claim = &constructed; > 1252: } It's not super obvious to me why the "has_sweep_claims" checking needs to be on this level. Can `G1ConcurrentRefineWorkState` return a valid `G1CardTableClaimTable*` directly? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1974124792 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1971426039 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1973435950 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1974083760 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1973447654 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1973452168 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1974056492 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1973423400 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1974108760 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1974134441 From gziemski at openjdk.org Thu Feb 27 18:35:29 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 27 Feb 2025 18:35:29 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 10:37:33 GMT, Afshin Zafari wrote: >> - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. >> - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. >> - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. >> - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > removed UseFlagInPlace test. This is what I found today. Will do more tomorrow... src/hotspot/share/nmt/memReporter.cpp line 440: > 438: VirtualMemoryTracker::Instance::tree()->visit_committed_regions(*reserved_rgn, > 439: [&](CommittedMemoryRegion& committed_rgn) { > 440: if (committed_rgn.size() == reserved_rgn->size() && committed_rgn.call_stack()->equals(*stack)) { If we are calling here `equals()` anyhow, why not have CommittedMemoryRegion:equals() that checks both the size and the stack? This way we can simply have: `if (committed_rgn.equals(reserved_rgn)) ` src/hotspot/share/nmt/memTracker.hpp line 142: > 140: if (addr != nullptr) { > 141: NmtVirtualMemoryLocker nvml; > 142: VirtualMemoryTracker::Instance::add_reserved_region((address)addr, size, stack, mem_tag); I do not like: `VirtualMemoryTracker::Instance::add_reserved_region` with the `Instance` being repeated over and over. I'd prefer `VirtualMemoryTracker::add_reserved_region` and have `Instance` be impl detail inside. src/hotspot/share/nmt/memoryFileTracker.hpp line 32: > 30: #include "nmt/nmtNativeCallStackStorage.hpp" > 31: #include "nmt/vmatree.hpp" > 32: #include "nmt/virtualMemoryTracker.hpp" Are you 100% sure we need it here? src/hotspot/share/nmt/virtualMemoryTracker.cpp line 59: > 57: if (_tracker == nullptr) return false; > 58: new (_tracker) VirtualMemoryTracker(level == NMT_detail); > 59: return _tracker->tree() != nullptr; We should check for `tree() != nullptr;` inside VirtualMemoryTracker constructor as assert? src/hotspot/share/nmt/virtualMemoryTracker.cpp line 114: > 112: committed = VirtualMemorySummary::as_snapshot()->by_type(tag)->committed(); > 113: if (reserve_delta != 0) { > 114: if (reserve_delta > 0) Missing braces. src/hotspot/share/nmt/virtualMemoryTracker.cpp line 118: > 116: else { > 117: if ((size_t)-reserve_delta <= reserved) > 118: VirtualMemorySummary::record_released_memory(-reserve_delta, tag); Missing braces. src/hotspot/share/nmt/virtualMemoryTracker.cpp line 129: > 127: } > 128: else > 129: print_err("commit"); Missing braces. src/hotspot/share/nmt/virtualMemoryTracker.cpp line 133: > 131: else { > 132: if ((size_t)-commit_delta <= committed) > 133: VirtualMemorySummary::record_uncommitted_memory(-commit_delta, tag); Missing braces. src/hotspot/share/nmt/virtualMemoryTracker.cpp line 135: > 133: VirtualMemorySummary::record_uncommitted_memory(-commit_delta, tag); > 134: else > 135: print_err("uncommit"); Missing braces. src/hotspot/share/nmt/virtualMemoryTracker.cpp line 213: > 211: log_info(nmt)("region in walker vmem, base: " INTPTR_FORMAT " size: %zu , %s, committed: %zu", > 212: p2i(rgn.base()), rgn.size(), rgn.tag_name(), rgn.committed_size()); > 213: if (!walker->do_allocation_site(&rgn)) Missing braces. src/hotspot/share/nmt/virtualMemoryTracker.cpp line 225: > 223: > 224: int compare_reserved_region_base(const ReservedMemoryRegion& r1, const ReservedMemoryRegion& r2) { > 225: return r1.compare(r2); Why did we name it `compare_reserved_region_base`, not simply `compare_reserved_region` src/hotspot/share/nmt/vmatree.cpp line 33: > 31: const VMATree::RegionData VMATree::empty_regiondata{NativeCallStackStorage::StackIndex{}, mtNone}; > 32: > 33: const char* VMATree::statetype_strings[4] = { You don't need do anything here if you take my suggestion from next comment... src/hotspot/share/nmt/vmatree.hpp line 73: > 71: assert(type < StateType::COUNT, "must be"); > 72: return statetype_strings[static_cast(type)]; > 73: } I don't like that we are hardcoding the size of this array and have COUNT be StateType. Can we do something like this?: enum class StateType : uint8_t { Reserved = 1, Committed = 3, Released = 0 }; private: static constexpr const char* const statetype_strings[] = {"released", "reserved", "only-committed", "committed"}; static constexpr int STATETYPE_COUNT = static_cast(sizeof(statetype_strings)/sizeof(char*)); public: NONCOPYABLE(VMATree); static const char* statetype_to_string(StateType type) { assert(static_cast(type) < STATETYPE_COUNT, "must be"); return statetype_strings[static_cast(type)]; } ------------- Changes requested by gziemski (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20425#pullrequestreview-2648596994 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974069506 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974080149 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974082904 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974096829 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974100115 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974099677 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974100680 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974101028 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974101373 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974103095 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974105989 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974108411 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974129696 From gziemski at openjdk.org Thu Feb 27 18:35:30 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 27 Feb 2025 18:35:30 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 13:25:35 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> removed UseFlagInPlace test. > > src/hotspot/share/nmt/memTracker.hpp line 60: > >> 58: static bool walk_virtual_memory(VirtualMemoryWalker* walker) { >> 59: return VirtualMemoryTracker::Instance::walk_virtual_memory(walker); >> 60: } > > The `MemTracker` API exposes the outer and locking implementation to the rest of Hotspot. These two methods are used by us internally. I think it's better if these methods are deleted and the `VirtualMemoryTracker::Instance` methods are called directly, instead. Not sure I agree with Johan's comment here - in memBaseline.cpp we use MemTracker a lot, so adding these APIs make sense there, otherwise we would have to split work between MemTracker and VirtualMemoryTracker. Personally I like this way better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974077639 From sviswanathan at openjdk.org Thu Feb 27 19:26:58 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 27 Feb 2025 19:26:58 GMT Subject: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 In-Reply-To: References: Message-ID: <7wEGLF0MOmtHAl_cwEOOXNPy_Ckz8j0WmabDR_asitM=.7e772dad-8e67-402f-bdc4-9dad0925f20c@github.com> On Thu, 20 Feb 2025 21:49:42 GMT, Volodymyr Paprotski wrote: > Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) > > Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. > > Before (no AVX512) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 3720.589 ? 17.879 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 3605.940 ? 15.807 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1076.502 ? 4.190 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1069.624 ? 2.484 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 830.448 ? 2.285 ops/s > > After (with AVX2) > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 6000.496 ? 39.923 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 5739.878 ? 34.838 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 40 1942.437 ? 12.179 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 40 1921.770 ? 8.992 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 40 1399.761 ? 6.238 ops/s > > > Before (with AVX512): > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 40 9621.950 ? 27.260 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 40 8975.654 ? 26.707 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 102... src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 397: > 395: __ xorq(acc2, acc2); > 396: __ addq(acc1, tmp_rax); > 397: __ adcq(acc2, tmp_rdx); Why adcq here instead of addq? The vector code doesn't do that. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 424: > 422: __ shrq(acc1, 52); // low 52 of acc1 ignored, is zero, because Montgomery > 423: > 424: // Acc2[0] += carry This is more like shift in carry into lower bits of Acc2[0] so comment could be updated. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 441: > 439: __ subq(acc2, modulus); > 440: __ vpsubq(Acc2, Acc1, Modulus, Assembler::AVX_256bit); > 441: __ vmovdqu(Address(rsp, -32), Acc2); //Assembler::AVX_256bit Need to first create space on stack and then store temp. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 465: > 463: > 464: // Now carry propagate the multiply result and (constant-time) select correct > 465: // output digit Carry propagate multiply result is done before subtracting modulus in the Java code. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 467: > 465: // output digit > 466: Register digit = acc1; > 467: __ vmovdqu(Address(rsp, -64), Acc1); //Assembler::AVX_256bit Need to first create space on stack and then store. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 475: > 473: } > 474: __ movq(carry, digit); > 475: __ sarq(carry, 52); This was unsigned or logical shift in Java code. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 556: > 554: // - constant time (i.e. no branches) > 555: // - no-side channel (i.e. all memory must always be accessed, and in same order) > 556: void assign_avx(Register aBase, Register bBase, int offset, XMMRegister select, XMMRegister tmp, XMMRegister aTmp, int vector_len, MacroAssembler* _masm) { Good to add the comment from assign_scalar here as well: // Original java: // long dummyLimbs = maskValue & (a[i] ^ b[i]); // a[i] = dummyLimbs ^ a[i]; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1974184239 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1974171188 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1974187392 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1974203227 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1974206111 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1974205184 PR Review Comment: https://git.openjdk.org/jdk/pull/23719#discussion_r1972517671 From fbredberg at openjdk.org Thu Feb 27 19:57:03 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 27 Feb 2025 19:57:03 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 05:19:44 GMT, David Holmes wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after review by David and Coleen. > > src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 331: > >> 329: volatile_nonstatic_field(ObjectMonitor, _owner, int64_t) \ >> 330: volatile_nonstatic_field(ObjectMonitor, _recursions, intptr_t) \ >> 331: volatile_nonstatic_field(ObjectMonitor, _entry_list, ObjectWaiter*) \ > > Suggestion: > > volatile_nonstatic_field(ObjectMonitor, _entry_list, ObjectWaiter*) \ > > Extra space Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 176: > >> 174: // Once we have formed a doubly linked list it's easy to find the >> 175: // successor, wake it up, have it remove itself, and update the >> 176: // tail pointer, as seen in 2) and 3) below. > > Suggestion: > > // tail pointer, as seen in 3) below. > > But have diagram 3 right here. Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 179: > >> 177: // >> 178: // At any time new threads can add themselves to the entry_list, see >> 179: // 4) and 5). > > Diagrams 4 and 5 do not follow from what has just been described, but the use of "at any time" implies to me you intended to show them affecting the queue as we have already seen it. > > Again show the diagram you want here. Rewrote diagram. > src/hotspot/share/runtime/objectMonitor.cpp line 183: > >> 181: // If the thread that removes itself from the end of the list hasn't >> 182: // got any prev pointer, we just set the tail pointer to null, see >> 183: // 5) and 6). > > Suggestion: > > // If the thread to be removed is the only thread in the entry list: > // entry_list -> A -> null > // entry_list_tail ---^ > // we remove it and just set the tail pointer to null, > // entry_list -> null > // entry_list_tail -> null Rewrote the diagram. Wanted to show how things work when he thread that removes itself from the end of the list hasn't got any prev pointer (and it's not the only thread in the entry list). > src/hotspot/share/runtime/objectMonitor.cpp line 187: > >> 185: // Next time we need to find the successor and the tail is null, we >> 186: // just start walking from the entry_list head again forming a new >> 187: // doubly linked list, see 6) and 7) below. > > Suggestion: > > // Next time we need to find the successor and the tail is null, > // entry_list ->I->H->G->null > // entry_list_tail ->null > // we just start walking from the entry_list head again forming a new > // doubly linked list: > // entry_list ->I<=>H<=>G->null > // entry_list_tail ----------^ Rewrote diagram. Didn't abandon the "number list" since everything else is written that way. > src/hotspot/share/runtime/objectMonitor.cpp line 189: > >> 187: // doubly linked list, see 6) and 7) below. >> 188: // >> 189: // 1) entry_list ->F->E->D->C->B->A->null > > Suggestion: > > // 1) entry_list ->F->E->D->C->B->A->null > > Right-justify the names please I think it's more readable to have it left-justified, since both entry_list and entry_list_tail both start with the same text. > src/hotspot/share/runtime/objectMonitor.cpp line 215: > >> 213: // The mutex property of the monitor itself protects the entry_list >> 214: // from concurrent interference. >> 215: // -- Only the monitor owner may detach nodes from the entry_list. > > Suggestion for this block - get rid of invariants headings and just say: > > // The monitor itself protects all of the operations on the entry_list except for the CAS of a new arrival > // to the head. Only the monitor owner can read or write the prev links (e.g. to remove itself) or update > // the tail. Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 225: > >> 223: // concurrent detaching thread. This mechanism is immune from the >> 224: // ABA corruption. More precisely, the CAS-based "push" onto >> 225: // entry_list is ABA-oblivious. > > Not sure this actually says anything to help people understand the code or its operation. There basically is no A-B-A issue with the use of CAS here. Rewritten the comment. > src/hotspot/share/runtime/objectMonitor.cpp line 227: > >> 225: // entry_list is ABA-oblivious. >> 226: // >> 227: // * The entry_list form a queue of threads stalled trying to acquire > > Suggestion: > > // * The entry_list forms a queue of threads stalled trying to acquire Fixed > src/hotspot/share/runtime/objectMonitor.hpp line 195: > >> 193: volatile intx _recursions; // recursion count, 0 for first entry >> 194: ObjectWaiter* volatile _entry_list; // Threads blocked on entry or reentry. >> 195: // The list is actually composed of WaitNodes, > > Suggestion: > > // The list is actually composed of wait-nodes, > > Pre-existing (check for other uses) `WaitNodes` reads like a class name but it isn't. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974244653 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974247893 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974246933 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974250054 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974251792 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974246012 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974252355 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974252954 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974253676 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974245155 From fbredberg at openjdk.org Thu Feb 27 19:57:04 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 27 Feb 2025 19:57:04 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 13:59:38 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 166: >> >>> 164: // its next pointer, and have its prev pointer set to null. Thus >>> 165: // pushing six threads A-F (in that order) onto entry_list, will >>> 166: // form a singly-linked list, see 1) below. >> >> Suggestion: have diagram 1 immediately follow this text so the reader doesn't have to jump down. > > I like this suggestion. I like these comments. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974247465 From coleen.phillimore at oracle.com Thu Feb 27 20:02:04 2025 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 27 Feb 2025 15:02:04 -0500 Subject: Discussion: How to get to single object header layout In-Reply-To: <881959A3-6D5D-4FE6-A029-B6F1F2BB1BDD@amazon.de> References: <881959A3-6D5D-4FE6-A029-B6F1F2BB1BDD@amazon.de> Message-ID: <9e2a15c0-2788-42bc-b0fa-7b7c5390e423@oracle.com> Roman, Thank you for sending out the plan for getting to a single object header layout.? See below: On 2/26/25 1:52 PM, Kennke, Roman wrote: > (2nd attempt at sending this. If you receive the other attempt, please ignore it.) > > This is a follow-up to discussions that we had at the OpenJDK Committers Workshop earlier this month. I have been asked to come up with a detailed schedule of how I envision to make compact object headers aka Project Lilliput the default and one and only header layout in HotSpot, and post it here for wider discussion. The goal is to get to a consensus and prepare the various HotSpot contributors for what may come and when. In particular, I am aware that other, potentially conflicting changes are in the pipeline too (looking at Valhalla, possibly other projects, too?), which we should coordinate, not only in terms of code, but also in terms of reviewer/testing resources. > > We agreed at the OCW that we should make the current 8-byte-headers the default object header layout first, and only then build 4-byte-headers on top of that. We also agreed that we want as few flags permutations as possible (if it were me, I would vote for having no new flags at all, and have new implementations replace old implementations, but I can see the operational usefulness of having a fallback available if something unexpected goes wrong.) > > Many of the proposed changes are ?only? various progressions of flags moving from new/experimental to non-experimental to default to deprecated and obsoleted. The bulk of code changes would be in JDK 27, when Lilliput 2 hits the repos (which isn't as scary as Lilliput 1, which replaced the whole locking subsystem), and the current 12-byte headers are removed. > > Please let me know what you think, how we can make this work, and especially whatever concerns you may have. > > JDK 25: > - 8350272: Deprecate UseCompressedClassPointers for removal > - 8350457: Support Compact Object Headers as product option I filed 8350457 to collect information that we need to decide to move UseCompactObjectHeaders out of Experimental mode.? See the bug for more details: https://bugs.openjdk.org/browse/JDK-8350457 One of the first and most important things we need is to show that there is a performance improvement either in throughput or density for this change.? I assigned this to you, but I'm also currently in the process of running our internal benchmarks to determine the impact of this change and I'll post analysis there when I have more details.? Also we need some fresh performance results that you are seeing.? If the option is a product option, we need to tell people why they want to use this option.? There may be some performance regressions for some applications with UCCP so we need to show to overall improvements in other types of applications, and or mitigate the regressions. The remaining tasks for JDK 26 and beyond depend on what we find as a result of this step, including L2, whose performance numbers would be interesting to see as well. Thank you, Coleen > > JDK 26: > - 8346011: [Lilliput] Compact Full-GC Forwarding (required for UCCP removal, pre-req for L2) > - Obsolete +/-UseCompressedClassPointers > - Make +UseCompactObjectHeaders on-by-default > - Deprecate -UseCompactObjectHeaders > > JDK 27: > - Obsolete -UseCompactObjectHeaders > - 8320761: [Lilliput] Implement compact identity hashcode (alternative code path to L1, under new experimental flag, e.g. +/-UseTinyObjectHeaders) > - 8347710: [Lilliput] Implement 4 byte headers (alternative code path to L1, under new experimental flag, e.g. +/-UseTinyObjectHeaders) > > JDK 28: > - Make +/-UseTinyObjectHeaders non-experimental > > JDK 29: > - Make +UseTinyObjectHeaders on-by-default > - Deprecate -UseTinyObjectHeaders > > JDK 30: > - Obsolete -UseTinyObjectHeaders From fbredberg at openjdk.org Thu Feb 27 20:04:02 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 27 Feb 2025 20:04:02 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 06:08:14 GMT, David Holmes wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after review by David and Coleen. > > src/hotspot/share/runtime/objectMonitor.cpp line 232: > >> 230: // thread notices that the tail of the entry_list is not known, we >> 231: // convert the singly-linked entry_list into a doubly linked list by >> 232: // assigning the prev pointers and the entry_list_tail pointer. > > Didn't we essentially say all this at the beginning? This text makes more sense before the newly added "Example:", so I moved it. > src/hotspot/share/runtime/objectMonitor.cpp line 260: > >> 258: // >> 259: // * notify() or notifyAll() simply transfers threads from the WaitSet >> 260: // to either the entry_list. Subsequent exit() operations will > > Suggestion: > > // to the entry_list. Subsequent exit() operations will Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 704: > >> 702: >> 703: for (;;) { >> 704: ObjectWaiter* front = Atomic::load(&_entry_list); > > In comments and code pick "head" or "front" to use to describe what _entry_list points to and use that consistently. I think "front" is much more common. A `grep -r `suggests that `head` is more common, so I changed to `head`. > src/hotspot/share/runtime/objectMonitor.cpp line 705: > >> 703: for (;;) { >> 704: ObjectWaiter* front = Atomic::load(&_entry_list); >> 705: > > No need for blank line. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974257620 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974259984 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974261995 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974260402 From fbredberg at openjdk.org Thu Feb 27 20:12:58 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 27 Feb 2025 20:12:58 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 13:56:15 GMT, Coleen Phillimore wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after review by David and Coleen. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 418: > >> 416: // have released the lock. >> 417: // Refer to the comments in synchronizer.cpp for how we might encode extra >> 418: // state in _succ so we can avoid fetching entry_list. > > I there is no comment in synchronizer about this (that I can find) and whether or not this is a good idea, can you remove this line with this change? Removed > src/hotspot/share/runtime/objectMonitor.hpp line 79: > >> 77: void set_bad_pointers() { >> 78: #ifdef ASSERT >> 79: // Diagnostic hygiene ... > > hygiene seems like the wrong word here. Can you remove this comment? Removed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974271052 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974271724 From fbredberg at openjdk.org Thu Feb 27 20:13:01 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 27 Feb 2025 20:13:01 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 06:19:38 GMT, David Holmes wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after review by David and Coleen. > > src/hotspot/share/runtime/objectMonitor.cpp line 724: > >> 722: for (;;) { >> 723: ObjectWaiter* front = Atomic::load(&_entry_list); >> 724: > > No need for blank line. Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 731: > >> 729: >> 730: // Interference - the CAS failed because _entry_list changed. Just retry. >> 731: // As an optional optimization we retry the lock. > > Suggestion: > > // Interference - the CAS failed because _entry_list changed. Before > // retrying the CAS retry taking the lock as it may now be free. Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 812: > >> 810: guarantee(_entry_list == nullptr, >> 811: "must be no entering threads: entry_list=" INTPTR_FORMAT, >> 812: p2i(_entry_list)); > > Mustn't re-read _entry_list in the p2i as it may have changed from the value that is causing the guarantee to fail. The old guarantees were buggy in this regard - a temp is needed. Fixed > src/hotspot/share/runtime/objectMonitor.cpp line 1299: > >> 1297: assert(_entry_list_tail == nullptr || _entry_list_tail == currentNode, "invariant"); >> 1298: >> 1299: ObjectWaiter* v = Atomic::load(&_entry_list); > > Nit: use `w` to be consistent with similar code. The original used `w` for EntryList and `v` for cxq IIRC. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974268658 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974268941 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974267878 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974269555 From fbredberg at openjdk.org Thu Feb 27 20:12:59 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 27 Feb 2025 20:12:59 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: <0ALa3fouoHHnr9xwosMUd0gxQnQFwomxSmQ8_4wijcY=.acdb876b-6b94-4320-904a-f7741d54c8de@github.com> On Thu, 27 Feb 2025 14:11:21 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 718: >> >>> 716: // if we added current to _entry_list. Once on _entry_list, current >>> 717: // stays on-queue until it acquires the lock. >>> 718: bool ObjectMonitor::try_lock_or_add_to_entry_list(JavaThread* current, ObjectWaiter* node) { >> >> Nit: the name suggests we do the try_lock first, when we don't. If we reverse the name we should also reverse the true/false return so that true relates to the first part of the name. See what others think. > > How about add_to_entry_list with a boolean parameter that tries the lock if it fails, and only have one of these functions? Although the return true if you get the lock makes it weird. > > > bool add_to_entry_list(JavaThread* current, ObjectWaiter* node, bool or_lock) { > return true if locked, false otherwise; > } > > > Maybe that makes sense. I wasn't completely happy with naming this `try_lock_or_add_to_entry_list` for the exact reason David points out. It does NOT first `try_lock` and then if that fails `add_to_entry_list`. It does the complete opposite. It first try to add to the entry list and if that fails, it tries to lock. So why on earth did I end up with this solution? Because I went along with how the current family of `try_enter`, `spin_enter` and `TryLockWithContentionMark` works. They all try to lock the monitor and if they succeed they return true otherwise they return false. And this is exactly how my `try_lock_or_add_to_entry_list` works, except for the fact that when it returns false (because we didn't get the lock) the current thread has been been added to the `entry_list`. I also think that combining the two functions into one (as Colleen suggests) just adds to the confusion, mostly because of the "weird" return value. I guess we just have to choose what kind of weirdness we can accept. I'm absolutely willing to change it if anyone has a strong opinion, or comes up with something that the majority think is better. For me joining the `TryLockWithContentionMark` etc. camp seemed like the most reasonable kind of weird. >> src/hotspot/share/runtime/objectMonitor.cpp line 719: >> >>> 717: // stays on-queue until it acquires the lock. >>> 718: bool ObjectMonitor::try_lock_or_add_to_entry_list(JavaThread* current, ObjectWaiter* node) { >>> 719: node->_prev = nullptr; >> >> Shouldn't this already be the case? > > I think for the vthread case, it isn't yet(?). Maybe motivation to fix the ObjectWaiter constructor with this patch? For the most part it is. But as Coleen points out, the vthread case might not be, and I'm not willing to risk it. >> src/hotspot/share/runtime/objectMonitor.cpp line 2018: >> >>> 2016: // that in prepend-mode we invert the order of the waiters. Let's say that the >>> 2017: // waitset is "ABCD" and the entry_list is "XYZ". After a notifyAll() in prepend >>> 2018: // mode the waitset will be empty and the entry_list will be "DCBAXYZ". >> >> We don't support different ordering modes any more so we always "prepend" such that waiters are added to the entry_list in the reverse order of waiting. So given waitList -> A -> B -> C -> D, and _entry_list -> x -> y -> z we will get _entry_list -> D -> C -> B -> A -> X -> Y -> Z > > One of the benefits of this work is to read, understand and clean up misleading and out of date comments in this code. Rewrote the comment. Let the waitset remain as a string "ABCD" because it would be to messy to try to depict it as a circular doubly linked list. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974266558 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974267473 PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974270597 From fbredberg at openjdk.org Thu Feb 27 20:19:01 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 27 Feb 2025 20:19:01 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 14:15:15 GMT, Coleen Phillimore wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after review by David and Coleen. > > src/hotspot/share/runtime/objectMonitor.cpp line 735: > >> 733: assert(!has_successor(current), "invariant"); >> 734: assert(has_owner(current), "invariant"); >> 735: return true; > > I wonder for a future RFE we can move these asserts into TryLock. Good idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974277231 From fbredberg at openjdk.org Thu Feb 27 20:19:02 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 27 Feb 2025 20:19:02 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 17:12:40 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/objectMonitor.hpp line 46: >> >>> 44: class ObjectWaiter : public CHeapObj { >>> 45: public: >>> 46: enum TStates : uint8_t { TS_UNDEF, TS_READY, TS_RUN, TS_WAIT, TS_ENTER }; >> >> TS_READY looks unused. > > Edit: this could be a trivial further PR. And so does `TS_UNDEF`, but the enum value for `TS_UNDEF` will be zero and maybe there is some hidden "check for uninitialized `TStates` code" somewhere that stops working... A grep also finds: `src/hotspot/share/prims/jvmtiRawMonitor.hpp: enum TStates { TS_READY, TS_RUN, TS_WAIT, TS_ENTER }; ` So, since this is not really in the core part of this PR, I'd like to postpone that change to a later cleanup RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974278590 From thomas.stuefe at gmail.com Thu Feb 27 20:32:32 2025 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 27 Feb 2025 21:32:32 +0100 Subject: Discussion: How to get to single object header layout In-Reply-To: <9e2a15c0-2788-42bc-b0fa-7b7c5390e423@oracle.com> References: <881959A3-6D5D-4FE6-A029-B6F1F2BB1BDD@amazon.de> <9e2a15c0-2788-42bc-b0fa-7b7c5390e423@oracle.com> Message-ID: Hi Coleen, I did benchmarks end of last year, and found the numbers for Lilliput1 to be encouraging. This was on the very-close-to-release version of Lilliput. I will brush up the results or repeat the measurements when I find time, but in short, I found in SpecJBB G1 pause time reduction of ~20% and CPU cache misses down by 15-20%. Live set size reduction of ~17%. Benchmark scores were also better. I remember that Romans results were a bit smaller, but I usually test on older hardware that may benefit more from more efficient CPU/memory use. Cheers, Thomas On Thu, Feb 27, 2025 at 9:02?PM wrote: > > Roman, > > Thank you for sending out the plan for getting to a single object header > layout. See below: > > On 2/26/25 1:52 PM, Kennke, Roman wrote: > > (2nd attempt at sending this. If you receive the other attempt, please > ignore it.) > > > > This is a follow-up to discussions that we had at the OpenJDK Committers > Workshop earlier this month. I have been asked to come up with a detailed > schedule of how I envision to make compact object headers aka Project > Lilliput the default and one and only header layout in HotSpot, and post it > here for wider discussion. The goal is to get to a consensus and prepare > the various HotSpot contributors for what may come and when. In particular, > I am aware that other, potentially conflicting changes are in the pipeline > too (looking at Valhalla, possibly other projects, too?), which we should > coordinate, not only in terms of code, but also in terms of > reviewer/testing resources. > > > > We agreed at the OCW that we should make the current 8-byte-headers the > default object header layout first, and only then build 4-byte-headers on > top of that. We also agreed that we want as few flags permutations as > possible (if it were me, I would vote for having no new flags at all, and > have new implementations replace old implementations, but I can see the > operational usefulness of having a fallback available if something > unexpected goes wrong.) > > > > Many of the proposed changes are ?only? various progressions of flags > moving from new/experimental to non-experimental to default to deprecated > and obsoleted. The bulk of code changes would be in JDK 27, when Lilliput 2 > hits the repos (which isn't as scary as Lilliput 1, which replaced the > whole locking subsystem), and the current 12-byte headers are removed. > > > > Please let me know what you think, how we can make this work, and > especially whatever concerns you may have. > > > > JDK 25: > > - 8350272: Deprecate UseCompressedClassPointers for removal > > - 8350457: Support Compact Object Headers as product option > > I filed 8350457 to collect information that we need to decide to move > UseCompactObjectHeaders out of Experimental mode. See the bug for more > details: https://bugs.openjdk.org/browse/JDK-8350457 > > One of the first and most important things we need is to show that there > is a performance improvement either in throughput or density for this > change. I assigned this to you, but I'm also currently in the process > of running our internal benchmarks to determine the impact of this > change and I'll post analysis there when I have more details. Also we > need some fresh performance results that you are seeing. If the option > is a product option, we need to tell people why they want to use this > option. There may be some performance regressions for some applications > with UCCP so we need to show to overall improvements in other types of > applications, and or mitigate the regressions. > > The remaining tasks for JDK 26 and beyond depend on what we find as a > result of this step, including L2, whose performance numbers would be > interesting to see as well. > > Thank you, > Coleen > > > > > JDK 26: > > - 8346011: [Lilliput] Compact Full-GC Forwarding (required for UCCP > removal, pre-req for L2) > > - Obsolete +/-UseCompressedClassPointers > > - Make +UseCompactObjectHeaders on-by-default > > - Deprecate -UseCompactObjectHeaders > > > > JDK 27: > > - Obsolete -UseCompactObjectHeaders > > - 8320761: [Lilliput] Implement compact identity hashcode (alternative > code path to L1, under new experimental flag, e.g. +/-UseTinyObjectHeaders) > > - 8347710: [Lilliput] Implement 4 byte headers (alternative code path to > L1, under new experimental flag, e.g. +/-UseTinyObjectHeaders) > > > > JDK 28: > > - Make +/-UseTinyObjectHeaders non-experimental > > > > JDK 29: > > - Make +UseTinyObjectHeaders on-by-default > > - Deprecate -UseTinyObjectHeaders > > > > JDK 30: > > - Obsolete -UseTinyObjectHeaders > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbredberg at openjdk.org Thu Feb 27 20:40:56 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 27 Feb 2025 20:40:56 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: <_XnhdwtuB6AhiTL4TYmV4yqIy_WwQEeASn2b2zL9-V0=.05ec2994-8599-4f76-871d-a9e2bbe8afa2@github.com> On Thu, 27 Feb 2025 15:54:28 GMT, Fredrik Bredberg wrote: >> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. >> >> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. >> >> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. >> >> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. >> >> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. >> >> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. >> >> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). >> >> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. >> >> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac... > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update after review by David and Coleen. I've used QEMU to smoke test this PR on ppc64le, riscv64 and s390x, But it would be nice if @TheRealMDoerr, @RealFYang and @offamitkumar could check if it runs okay on real hardware as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23421#issuecomment-2689061860 From fbredberg at openjdk.org Thu Feb 27 20:53:56 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 27 Feb 2025 20:53:56 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 15:54:28 GMT, Fredrik Bredberg wrote: >> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. >> >> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. >> >> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. >> >> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. >> >> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. >> >> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. >> >> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). >> >> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. >> >> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac... > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update after review by David and Coleen. @pchilano Since I have removed the `cxq` list @dholmes-ora suggested that I should also rename `_vthread_cxq_head`. Thereby removing the term "cxq" altogether. I chose to rename `_vthread_cxq_head` with `_vthread_list_head`. Hope that is okay. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23421#issuecomment-2689083393 From fbredberg at openjdk.org Thu Feb 27 20:59:55 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 27 Feb 2025 20:59:55 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: <1S7kUz3GfEDitlf6dU4nF5Tl1X7UNBhMDdWCPE9Apos=.a1e7abc2-065d-4fe3-95b2-d0d5ca884dac@github.com> On Mon, 10 Feb 2025 12:51:43 GMT, Fredrik Bredberg wrote: >> src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 332: >> >>> 330: volatile_nonstatic_field(ObjectMonitor, _owner, int64_t) \ >>> 331: volatile_nonstatic_field(ObjectMonitor, _recursions, intptr_t) \ >>> 332: volatile_nonstatic_field(ObjectMonitor, _EntryListTail, ObjectWaiter*) \ >> >> You may need to coordinate with @mur47x111 to see what graal does with this field. I suspect the graal code also checks both ctx and EntryList in the unlock fast path and now only needs to check _EntryList. In which case we don't need to export EntryListTail. > > Thanks for the heads up @coleenp . I was planing on contacting the Graal team when this PR gets closer to getting integrated. I'll delete the `_EntryListTail` export, and make sure to ask for a review from @mur47x111 when that time comes. They seem to have everything under control: [[JDK-8349711] Adapt JDK-8343840: Rewrite the ObjectMonitor lists](https://github.com/oracle/graal/pull/10757) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1974327790 From azafari at openjdk.org Thu Feb 27 21:04:13 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 27 Feb 2025 21:04:13 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 17:50:25 GMT, Gerard Ziemski wrote: >> src/hotspot/share/nmt/memTracker.hpp line 60: >> >>> 58: static bool walk_virtual_memory(VirtualMemoryWalker* walker) { >>> 59: return VirtualMemoryTracker::Instance::walk_virtual_memory(walker); >>> 60: } >> >> The `MemTracker` API exposes the outer and locking implementation to the rest of Hotspot. These two methods are used by us internally. I think it's better if these methods are deleted and the `VirtualMemoryTracker::Instance` methods are called directly, instead. > > Not sure I agree with Johan's comment here - in memBaseline.cpp we use MemTracker a lot, so adding these APIs make sense there, otherwise we would have to split work between MemTracker and VirtualMemoryTracker. > > Personally I like this way better. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974328347 From azafari at openjdk.org Thu Feb 27 21:04:15 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 27 Feb 2025 21:04:15 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 13:29:47 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> removed UseFlagInPlace test. > > src/hotspot/share/nmt/nmtTreap.hpp line 416: > >> 414: if (cmp_from >= 0 && cmp_to < 0) { >> 415: if (!f(head)) >> 416: return; > > Style: Always use braces in if statements. Done. > src/hotspot/share/nmt/regionsTree.hpp line 62: > >> 60: inline VMATree::StateType in_state() { return _node->val().in.type(); } >> 61: inline VMATree::StateType out_state() { return _node->val().out.type(); } >> 62: inline size_t distance_from(NodeHelper& other) { return position() - other.position(); } > > `assert(position() > other.position()`. Done. > src/hotspot/share/nmt/regionsTree.hpp line 82: > >> 80: ); >> 81: } >> 82: }; > > 1. Doesn't need to be inline, move to `cpp` file. > > 2. No need to cast to `int`, just use the `VMATree::StateType` directly. > > 3. Should be wrapped in `#ifdef ASSERT` probably, I don't see us shipping this in product builds. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974328543 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974331320 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974328760 From azafari at openjdk.org Thu Feb 27 21:07:12 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 27 Feb 2025 21:07:12 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 13:35:40 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> removed UseFlagInPlace test. > > src/hotspot/share/nmt/regionsTree.hpp line 49: > >> 47: using Node = VMATree::TreapNode; >> 48: >> 49: class NodeHelper { > > Most of the methods here should be `const` and take `const NodeHelper&` as arguments. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974335058 From azafari at openjdk.org Thu Feb 27 21:13:09 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 27 Feb 2025 21:13:09 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 13:41:09 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> removed UseFlagInPlace test. > > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 61: > >> 59: return _tracker->tree() != nullptr; >> 60: } >> 61: return true; > > ```c++ > void* tracker = os::malloc(sizeof(VirtualMemoryTracker), mtNMT), > if (tracker == nullptr) return false; > _tracker = new (tracker) VirtualMemoryTracker(level == NMT_detail); Done. > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 319: > >> 317: } >> 318: VirtualMemoryTracker::Instance::add_committed_region(committed_start, committed_size, ncs); >> 319: //log_warning(cds)("st start: " INTPTR_FORMAT " size: " SIZE_FORMAT, p2i(committed_start), committed_size); > > Outdated log Done. > src/hotspot/share/nmt/virtualMemoryTracker.hpp line 300: > >> 298: >> 299: public: >> 300: CommittedMemoryRegion() : > > Style: > ```c== > CommittedMemoryRegion() > : VirtualMemoryRegion((address)1, 1), _stack(NativeCallStack::empty_stack()) { } Done. > src/hotspot/share/nmt/virtualMemoryTracker.hpp line 322: > >> 320: bool is_valid() { return base() != (address)1 && size() != 1;} >> 321: ReservedMemoryRegion() : >> 322: VirtualMemoryRegion((address)1, 1), _stack(NativeCallStack::empty_stack()), _mem_tag(mtNone) { } > > Style: Space between `is_valid` and constructor, fix initializer list as in `CommittedMemoryRegion`. Done. > src/hotspot/share/nmt/virtualMemoryTracker.hpp line 372: > >> 370: class VirtualMemoryTracker { >> 371: private: >> 372: RegionsTree *_tree; > > `class RegionsTree;` shouldn't be needed if you fix the circular include from above. There is no need to have the `private:` specifier. The `_tree` doesn't need to be a pointer after the forward declaration is removed, which in turn simplifies the initialization code. new `regionsTree.inline.hpp` is introduced to resolve the dependencies. > test/hotspot/gtest/nmt/test_nmt_treap.cpp line 30: > >> 28: #include "runtime/os.hpp" >> 29: #include "unittest.hpp" >> 30: #include "utilities/linkedlist.hpp" > > Outdated header inclusion Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974339508 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974339232 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974338173 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974338397 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974340867 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1974341165 From coleen.phillimore at oracle.com Thu Feb 27 21:19:44 2025 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 27 Feb 2025 16:19:44 -0500 Subject: Discussion: How to get to single object header layout In-Reply-To: References: <881959A3-6D5D-4FE6-A029-B6F1F2BB1BDD@amazon.de> <9e2a15c0-2788-42bc-b0fa-7b7c5390e423@oracle.com> Message-ID: <83fb0a88-a409-4c2c-9a27-94e6d382d90a@oracle.com> On 2/27/25 3:32 PM, Thomas St?fe wrote: > Hi Coleen, > > I did?benchmarks end of last year, and found the numbers for Lilliput1 > to be encouraging. This was on the very-close-to-release version of > Lilliput. I will brush up the results or repeat the measurements when > I find time, but in short, I found in SpecJBB G1 pause time > reduction?of ~20% and CPU cache misses down by 15-20%. Live set size > reduction of ~17%. Benchmark scores were also better. Thomas, this is good.? Please post these results in the JDK issue, and what you measured and what sort of hardware configurations when you have a chance. Thanks, Coleen > > I remember that Romans results were a bit smaller, but I usually test > on older hardware that may benefit more from more efficient CPU/memory > use. > > Cheers, Thomas > > > On Thu, Feb 27, 2025 at 9:02?PM wrote: > > > Roman, > > Thank you for sending out the plan for getting to a single object > header > layout.? See below: > > On 2/26/25 1:52 PM, Kennke, Roman wrote: > > (2nd attempt at sending this. If you receive the other attempt, > please ignore it.) > > > > This is a follow-up to discussions that we had at the OpenJDK > Committers Workshop earlier this month. I have been asked to come > up with a detailed schedule of how I envision to make compact > object headers aka Project Lilliput the default and one and only > header layout in HotSpot, and post it here for wider discussion. > The goal is to get to a consensus and prepare the various HotSpot > contributors for what may come and when. In particular, I am aware > that other, potentially conflicting changes are in the pipeline > too (looking at Valhalla, possibly other projects, too?), which we > should coordinate, not only in terms of code, but also in terms of > reviewer/testing resources. > > > > We agreed at the OCW that we should make the current > 8-byte-headers the default object header layout first, and only > then build 4-byte-headers on top of that. We also agreed that we > want as few flags permutations as possible (if it were me, I would > vote for having no new flags at all, and have new implementations > replace old implementations, but I can see the operational > usefulness of having a fallback available if something unexpected > goes wrong.) > > > > Many of the proposed changes are ?only? various progressions of > flags moving from new/experimental to non-experimental to default > to deprecated and obsoleted. The bulk of code changes would be in > JDK 27, when Lilliput 2 hits the repos (which isn't as scary as > Lilliput 1, which replaced the whole locking subsystem), and the > current 12-byte headers are removed. > > > > Please let me know what you think, how we can make this work, > and especially whatever concerns you may have. > > > > JDK 25: > > - 8350272: Deprecate UseCompressedClassPointers for removal > > - 8350457: Support Compact Object Headers as product option > > I filed 8350457 to collect information that we need to decide to move > UseCompactObjectHeaders out of Experimental mode.? See the bug for > more > details: https://bugs.openjdk.org/browse/JDK-8350457 > > One of the first and most important things we need is to show that > there > is a performance improvement either in throughput or density for this > change.? I assigned this to you, but I'm also currently in the > process > of running our internal benchmarks to determine the impact of this > change and I'll post analysis there when I have more details. Also we > need some fresh performance results that you are seeing.? If the > option > is a product option, we need to tell people why they want to use this > option.? There may be some performance regressions for some > applications > with UCCP so we need to show to overall improvements in other > types of > applications, and or mitigate the regressions. > > The remaining tasks for JDK 26 and beyond depend on what we find as a > result of this step, including L2, whose performance numbers would be > interesting to see as well. > > Thank you, > Coleen > > > > > JDK 26: > > - 8346011: [Lilliput] Compact Full-GC Forwarding (required for > UCCP removal, pre-req for L2) > > - Obsolete +/-UseCompressedClassPointers > > - Make +UseCompactObjectHeaders on-by-default > > - Deprecate -UseCompactObjectHeaders > > > > JDK 27: > > - Obsolete -UseCompactObjectHeaders > > - 8320761: [Lilliput] Implement compact identity hashcode > (alternative code path to L1, under new experimental flag, e.g. > +/-UseTinyObjectHeaders) > > - 8347710: [Lilliput] Implement 4 byte headers (alternative code > path to L1, under new experimental flag, e.g. +/-UseTinyObjectHeaders) > > > > JDK 28: > > - Make +/-UseTinyObjectHeaders non-experimental > > > > JDK 29: > > - Make +UseTinyObjectHeaders on-by-default > > - Deprecate -UseTinyObjectHeaders > > > > JDK 30: > > - Obsolete -UseTinyObjectHeaders > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dlong at openjdk.org Fri Feb 28 03:31:53 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 28 Feb 2025 03:31:53 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 10:24:31 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. >> >> Additionally, I've created `align_up_or_min`, which behaves similarly to the original align_up but handles overflows predictably across all integer types. This new function is used in the locations where overflow checks already exist, providing a safer alternative. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > reverted gcarguments and updated test If MinHeapDeltaBytes is the only problem, maybe we should align it down the max heap size before trying to align it up, or give it a max size smaller than max_uintx. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23711#issuecomment-2689633726 From fyang at openjdk.org Fri Feb 28 05:23:54 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 28 Feb 2025 05:23:54 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: <_XnhdwtuB6AhiTL4TYmV4yqIy_WwQEeASn2b2zL9-V0=.05ec2994-8599-4f76-871d-a9e2bbe8afa2@github.com> References: <_XnhdwtuB6AhiTL4TYmV4yqIy_WwQEeASn2b2zL9-V0=.05ec2994-8599-4f76-871d-a9e2bbe8afa2@github.com> Message-ID: On Thu, 27 Feb 2025 20:38:32 GMT, Fredrik Bredberg wrote: > I've used QEMU to smoke test this PR on ppc64le, riscv64 and s390x, But it would be nice if @TheRealMDoerr, @RealFYang and @offamitkumar could check if it runs okay on real hardware as well. FYI: hs:tier1 - hs:tier3 test good on linux-riscv64 platform. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23421#issuecomment-2689751810 From fyang at openjdk.org Fri Feb 28 05:24:54 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 28 Feb 2025 05:24:54 GMT Subject: RFR: 8350855: RISC-V: print offset by assert of patch_offset_in_conditional_branch In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 12:27:49 GMT, Hamlin Li wrote: > HI, > Can you help to review this trivial patch? > We are facing the assert occasionally, but currently there is offset info printed out, it's good to have it, as it's not easy to reproduce it. > > Thanks Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23821#pullrequestreview-2649757703 From duke at openjdk.org Fri Feb 28 06:22:09 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Fri, 28 Feb 2025 06:22:09 GMT Subject: RFR: 8348561: Add aarch64 intrinsics for ML-DSA [v8] In-Reply-To: References: Message-ID: > By using the aarch64 vector registers the speed of the computation of the ML-DSA algorithms (key generation, document signing, signature verification) can be approximately doubled. Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merged master. - Added more comments, mainly as suggested by Andrew Dinn - Changed aarch64-asmtest.py as suggested by Bhavana-Kilambi - Accepting suggested change from Andrew Dinn - Added comments suggested by Andrew Dinn - Fixed copyright years - renaming a couple of functions - Adding comments + some code reorganization - removed debugging code - merging master - ... and 3 more: https://git.openjdk.org/jdk/compare/ab4b0ef9...d82dfb2f ------------- Changes: https://git.openjdk.org/jdk/pull/23300/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23300&range=07 Stats: 2611 lines in 22 files changed: 2030 ins; 92 del; 489 mod Patch: https://git.openjdk.org/jdk/pull/23300.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23300/head:pull/23300 PR: https://git.openjdk.org/jdk/pull/23300 From dholmes at openjdk.org Fri Feb 28 07:02:55 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 28 Feb 2025 07:02:55 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 15:54:28 GMT, Fredrik Bredberg wrote: >> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`. >> >> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past. >> >> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks. >> >> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`. >> >> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable. >> >> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list. >> >> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor). >> >> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation. >> >> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac... > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update after review by David and Coleen. Okay that's good enough for me. :) Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23421#pullrequestreview-2649910490 From amitkumar at openjdk.org Fri Feb 28 07:02:56 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Feb 2025 07:02:56 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: <_XnhdwtuB6AhiTL4TYmV4yqIy_WwQEeASn2b2zL9-V0=.05ec2994-8599-4f76-871d-a9e2bbe8afa2@github.com> Message-ID: On Fri, 28 Feb 2025 05:21:34 GMT, Fei Yang wrote: > I've used QEMU to smoke test this PR on ppc64le, riscv64 and s390x, But it would be nice if @TheRealMDoerr, @RealFYang and @offamitkumar could check if it runs okay on real hardware as well. Tier1 test passed on s390x. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23421#issuecomment-2689887509 From duke at openjdk.org Fri Feb 28 07:45:53 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Fri, 28 Feb 2025 07:45:53 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 14:48:16 GMT, Severin Gehwolf wrote: >> This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. >> >> I tested it with: >> >> >> java -Xlog:os+container=trace -version >> >> on: >> >> `Red Hat Enterprise Linux 8 (cgroups v1 only)`: >> _No change in behaviour_ >> >> `Fedora 41 (cgroups v2)`: >> _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ >> >> --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 >> +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 >> @@ -1,7 +1,12 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> +[debug][os,container] v2 controller cpuset is enabled and relevant >> +[debug][os,container] v2 controller cpu is enabled and required >> +[debug][os,container] v2 controller io is enabled but not relevant >> +[debug][os,container] v2 controller memory is enabled and required >> +[debug][os,container] v2 controller hugetlb is enabled but not relevant >> +[debug][os,container] v2 controller pids is enabled and relevant >> +[debug][os,container] v2 controller rdma is enabled but not relevant >> +[debug][os,container] v2 controller misc is enabled but not relevant >> [debug][os,container] Detected cgroups v2 unified hierarchy >> [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope >> [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max >> >> >> `Fedora 41 (custom kernel with cgroups v1 disabled)`: >> _Fixes `cgroups v2` detection:_ >> >> --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 >> +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 >> @@ -1,7 +1,63 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> -[debug][os,container] controller memory is not enabled >> - ] >> -[debug][os,container] One or more required controllers disabled at kernel level. >> +[... > > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 360: > >> 358: for (int i = 0; i < CG_INFO_LENGTH; i++) { >> 359: // pids and cpuset controllers are optional. All other controllers are required >> 360: if (i != PIDS_IDX && i != CPUSET_IDX) { > > Same comment for `cpuset` controller. Keep it required for cg v1. Hmm, this logic was there before my changes; I am pretty sure I did keep `cpuset` required for `cgroups v1`. To attempt to confirm my patch had not changed the `cgroups v1` `cpuset` optionality logic, I wrote a test case for "cgroups v1, cpuset not present in /proc/cgroups". (To avoid assertion failures I created fake `/proc/self/cgroup` and `/proc/self/mountinfo` files that also do not have `cpuset` lines.) I adapted the test case to before my patch: https://github.com/fitzsim/jdk/commit/cab0000598b522d38865452575f95ed368b5935b and after my patch: https://github.com/fitzsim/jdk/commit/d8537baed5d56b5f36ce628b8c180ce18ce6f6e6 and it passes in both cases. i.e., for `cgroups v1`, `cpuset` is required, so `determine_type` produces `INVALID_CGROUPS_V1 (4)`, from this logic: if (!cg_infos[CPUSET_IDX]._data_complete) { log_debug(os, container)("Required cgroup v1 cpuset subsystem not found"); cleanup(cg_infos); *flags = INVALID_CGROUPS_V1; return false; } The test fails in both cases if I remove ` && i != CPUSET_IDX`, with `determine_type` returning `INVALID_CGROUPS_GENERIC (6)`: Required cpuset controller missing in /proc/cgroups. Invalid. expected: 4 but was: 6 due to this block (which my patch left alone): if (!all_required_controllers_enabled) { // one or more required controllers disabled, disable container support log_debug(os, container)("One or more required controllers disabled at kernel level."); cleanup(cg_infos); *flags = INVALID_CGROUPS_GENERIC; return false; } No existing test checks this logic as far as I can tell, so if you think the above makes sense, I can add this test to this pull request. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1974950233 From mli at openjdk.org Fri Feb 28 09:09:06 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 28 Feb 2025 09:09:06 GMT Subject: RFR: 8350855: RISC-V: print offset by assert of patch_offset_in_conditional_branch In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 12:27:49 GMT, Hamlin Li wrote: > HI, > Can you help to review this trivial patch? > We are facing the assert occasionally, but currently there is no offset info printed out, it's good to have it, as it's not easy to reproduce it. > > Thanks Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23821#issuecomment-2690103029 From mli at openjdk.org Fri Feb 28 09:09:07 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 28 Feb 2025 09:09:07 GMT Subject: Integrated: 8350855: RISC-V: print offset by assert of patch_offset_in_conditional_branch In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 12:27:49 GMT, Hamlin Li wrote: > HI, > Can you help to review this trivial patch? > We are facing the assert occasionally, but currently there is no offset info printed out, it's good to have it, as it's not easy to reproduce it. > > Thanks This pull request has now been integrated. Changeset: eada1ea8 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/eada1ea8d21c4811834e20ca467e136580d6cd0a Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8350855: RISC-V: print offset by assert of patch_offset_in_conditional_branch Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/23821 From sgehwolf at openjdk.org Fri Feb 28 09:39:53 2025 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 28 Feb 2025 09:39:53 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 14:47:23 GMT, Severin Gehwolf wrote: >> This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811. >> >> I tested it with: >> >> >> java -Xlog:os+container=trace -version >> >> on: >> >> `Red Hat Enterprise Linux 8 (cgroups v1 only)`: >> _No change in behaviour_ >> >> `Fedora 41 (cgroups v2)`: >> _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_ >> >> --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500 >> +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500 >> @@ -1,7 +1,12 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> +[debug][os,container] v2 controller cpuset is enabled and relevant >> +[debug][os,container] v2 controller cpu is enabled and required >> +[debug][os,container] v2 controller io is enabled but not relevant >> +[debug][os,container] v2 controller memory is enabled and required >> +[debug][os,container] v2 controller hugetlb is enabled but not relevant >> +[debug][os,container] v2 controller pids is enabled and relevant >> +[debug][os,container] v2 controller rdma is enabled but not relevant >> +[debug][os,container] v2 controller misc is enabled but not relevant >> [debug][os,container] Detected cgroups v2 unified hierarchy >> [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope >> [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max >> >> >> `Fedora 41 (custom kernel with cgroups v1 disabled)`: >> _Fixes `cgroups v2` detection:_ >> >> --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500 >> +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500 >> @@ -1,7 +1,63 @@ >> [trace][os,container] OSContainer::init: Initializing Container Support >> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups >> -[debug][os,container] controller cpuset is not enabled >> - ] >> -[debug][os,container] controller memory is not enabled >> - ] >> -[debug][os,container] One or more required controllers disabled at kernel level. >> +[... > > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 339: > >> 337: cg_infos[MEMORY_IDX]._enabled = (enabled == 1); >> 338: } else if (strcmp(name, "cpuset") == 0) { >> 339: log_debug(os, container)("Detected optional cpuset controller entry in %s", controllers_file); > > In https://bugs.openjdk.org/browse/JDK-8347129 we decided to keep the `cpuset` optionality alone for cg v1. I'd prefer if we kept it that way as it's becoming increasingly difficult to find those systems (or change them). Thus, hard to know if this would break something. Update: Please remove the log line, since this is the cg v1 branch and there cpuset isn't optional. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1975107752 From duke at openjdk.org Fri Feb 28 09:46:32 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Fri, 28 Feb 2025 09:46:32 GMT Subject: RFR: 8349721: Add aarch64 intrinsics for ML-KEM [v2] In-Reply-To: References: Message-ID: > By using the aarch64 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. Ferenc Rakoczi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merged master - removing trailing spaces - kyber aarch64 intrinsics ------------- Changes: https://git.openjdk.org/jdk/pull/23663/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23663&range=01 Stats: 2885 lines in 20 files changed: 2774 ins; 84 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/23663.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23663/head:pull/23663 PR: https://git.openjdk.org/jdk/pull/23663 From jsjolen at openjdk.org Fri Feb 28 09:47:14 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 28 Feb 2025 09:47:14 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 17:52:19 GMT, Gerard Ziemski wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> removed UseFlagInPlace test. > > src/hotspot/share/nmt/memTracker.hpp line 142: > >> 140: if (addr != nullptr) { >> 141: NmtVirtualMemoryLocker nvml; >> 142: VirtualMemoryTracker::Instance::add_reserved_region((address)addr, size, stack, mem_tag); > > I do not like: > > `VirtualMemoryTracker::Instance::add_reserved_region` > > with the `Instance` being repeated over and over. > > I'd prefer `VirtualMemoryTracker::add_reserved_region` and have `Instance` be impl detail inside. The wordiness is a bit annoying. The reason that we do this is to separate the global static instance from the implementation, so that we can have many `VMT`s when testing (this is very useful). Do you have a concrete way we can refactor this such that we retain the possibility of having many VMTs and one static instance, whilst reducing the wordiness when using the code? Can this refactoring wait until after integration, as we have more classes following the same pattern that'd need to be refactored? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975118055 From duke at openjdk.org Fri Feb 28 10:15:09 2025 From: duke at openjdk.org (Ferenc Rakoczi) Date: Fri, 28 Feb 2025 10:15:09 GMT Subject: RFR: 8349721: Add aarch64 intrinsics for ML-KEM [v3] In-Reply-To: References: Message-ID: > By using the aarch64 vector registers the speed of the computation of the ML-KEM algorithms (key generation, encapsulation, decapsulation) can be approximately doubled. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: A little cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23663/files - new: https://git.openjdk.org/jdk/pull/23663/files/ff0f8430..4adc5cf2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23663&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23663&range=01-02 Stats: 24 lines in 3 files changed: 0 ins; 23 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23663.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23663/head:pull/23663 PR: https://git.openjdk.org/jdk/pull/23663 From jsjolen at openjdk.org Fri Feb 28 10:17:09 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 28 Feb 2025 10:17:09 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 18:27:53 GMT, Gerard Ziemski wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> removed UseFlagInPlace test. > > src/hotspot/share/nmt/vmatree.hpp line 73: > >> 71: assert(type < StateType::COUNT, "must be"); >> 72: return statetype_strings[static_cast(type)]; >> 73: } > > I don't like that we are hardcoding the size of this array and have COUNT be StateType. Can we do something like this?: > > > enum class StateType : uint8_t { Reserved = 1, Committed = 3, Released = 0 }; > > private: > static constexpr const char* const statetype_strings[] = {"released", "reserved", "only-committed", "committed"}; > static constexpr int STATETYPE_COUNT = static_cast(sizeof(statetype_strings)/sizeof(char*)); > > public: > NONCOPYABLE(VMATree); > > static const char* statetype_to_string(StateType type) { > assert(static_cast(type) < STATETYPE_COUNT, "must be"); > return statetype_strings[static_cast(type)]; > } This is a fairly standard pattern in Hotspot, so personally I'm fine with keeping it as-is. Here's an incomplete list of pre-existing usages: ```c++ // UL tag list enum type { __NO_TAG, #define LOG_TAG(name) _##name, LOG_TAG_LIST #undef LOG_TAG Count }; // enum for figuring positions and size of Symbol::_vm_symbols[] enum class vmSymbolID : int { // [FIRST_SID ... LAST_SID] is the iteration range for the *valid* symbols. // NO_SID is used to indicate an invalid symbol. Some implementation code // *may* read _vm_symbols[NO_SID], so it must be a valid array index. NO_SID = 0, // exclusive lower limit #define VM_SYMBOL_ENUM(name, string) VM_SYMBOL_ENUM_NAME_(name), VM_SYMBOLS_DO(VM_SYMBOL_ENUM, VM_ALIAS_IGNORE) #undef VM_SYMBOL_ENUM SID_LIMIT, // exclusive upper limit #define VM_ALIAS_ENUM(name, def) VM_SYMBOL_ENUM_NAME_(name) = VM_SYMBOL_ENUM_NAME_(def), VM_SYMBOLS_DO(VM_SYMBOL_IGNORE, VM_ALIAS_ENUM) #undef VM_ALIAS_ENUM FIRST_SID = NO_SID + 1, // inclusive lower limit LAST_SID = SID_LIMIT - 1, // inclusive upper limit }; enum class InjectedFieldID : int { ALL_INJECTED_FIELDS(DECLARE_INJECTED_FIELD_ENUM) MAX_enum }; enum class vmClassID : int { #define DECLARE_VM_CLASS(name, symbol) _VM_CLASS_ENUM(name), _VM_CLASS_ENUM(symbol) = _VM_CLASS_ENUM(name), VM_CLASSES_DO(DECLARE_VM_CLASS) #undef DECLARE_VM_CLASS LIMIT, // exclusive upper limit FIRST = 0, // inclusive upper limit LAST = LIMIT - 1 // inclusive upper limit }; enum class CodeBlobType { MethodNonProfiled = 0, // Execution level 1 and 4 (non-profiled) nmethods (including native nmethods) MethodProfiled = 1, // Execution level 2 and 3 (profiled) nmethods NonNMethod = 2, // Non-nmethods like Buffers, Adapters and Runtime Stubs All = 3, // All types (No code cache segmentation) NumTypes = 4 // Number of CodeBlobTypes }; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975161907 From ayang at openjdk.org Fri Feb 28 10:18:52 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 28 Feb 2025 10:18:52 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v2] In-Reply-To: References: Message-ID: <9_prMYwlvIPlwpXt8rvCcAiwzhEQVOX4I3Mlsw3vHzI=.e3170a4f-ba5d-43e1-96d8-126b434c06a2@github.com> On Fri, 28 Feb 2025 03:29:20 GMT, Dean Long wrote: > or give it a max size smaller than max_uintx. I think this makes more sense; maybe sth like `max_uintx/2` or even smaller. In practice, `MinHeapDeltaBytes` should be much much smaller than `max_uintx`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23711#issuecomment-2690263717 From jsjolen at openjdk.org Fri Feb 28 10:22:17 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 28 Feb 2025 10:22:17 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 20:58:13 GMT, Afshin Zafari wrote: >> Not sure I agree with Johan's comment here - in memBaseline.cpp we use MemTracker a lot, so adding these APIs make sense there, otherwise we would have to split work between MemTracker and VirtualMemoryTracker. >> >> Personally I like this way better. > > Done. I'm not sure what you mean with using MemTracker a lot, we use it twice: Once for taking the NMT lock, and once for checking the NMT level. The current structure of NMT is that `MemTracker` is responsible for taking locks and checking if NMT is enabled, and then calling the underlying APIs. The underlying APIs are only meant to be called directly by other NMT components, such as `MemBaseline` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975168022 From sgehwolf at openjdk.org Fri Feb 28 10:23:57 2025 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 28 Feb 2025 10:23:57 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 07:43:25 GMT, Thomas Fitzsimmons wrote: >> src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 360: >> >>> 358: for (int i = 0; i < CG_INFO_LENGTH; i++) { >>> 359: // pids and cpuset controllers are optional. All other controllers are required >>> 360: if (i != PIDS_IDX && i != CPUSET_IDX) { >> >> Same comment for `cpuset` controller. Keep it required for cg v1. > > Hmm, this logic was there before my changes; I am pretty sure I did keep `cpuset` required for `cgroups v1`. > > To attempt to confirm my patch had not changed the `cgroups v1` `cpuset` optionality logic, I wrote a test case for "cgroups v1, cpuset not present in /proc/cgroups". > > (To avoid assertion failures I created fake `/proc/self/cgroup` and `/proc/self/mountinfo` files that also do not have `cpuset` lines.) > > I adapted the test case to before my patch: > > https://github.com/fitzsim/jdk/commit/cab0000598b522d38865452575f95ed368b5935b > > and after my patch: > > https://github.com/fitzsim/jdk/commit/d8537baed5d56b5f36ce628b8c180ce18ce6f6e6 > > and it passes in both cases. i.e., for `cgroups v1`, `cpuset` is required, so `determine_type` produces `INVALID_CGROUPS_V1 (4)`, from this logic: > > > if (!cg_infos[CPUSET_IDX]._data_complete) { > log_debug(os, container)("Required cgroup v1 cpuset subsystem not found"); > cleanup(cg_infos); > *flags = INVALID_CGROUPS_V1; > return false; > } > > > The test fails in both cases if I remove ` && i != CPUSET_IDX`, with `determine_type` returning `INVALID_CGROUPS_GENERIC (6)`: > > > Required cpuset controller missing in /proc/cgroups. Invalid. expected: 4 but was: 6 > > > due to this block (which my patch left alone): > > > if (!all_required_controllers_enabled) { > // one or more required controllers disabled, disable container support > log_debug(os, container)("One or more required controllers disabled at kernel level."); > cleanup(cg_infos); > *flags = INVALID_CGROUPS_GENERIC; > return false; > } > > > No existing test checks this logic as far as I can tell, so if you think the above makes sense, I can add this test to this pull request. You are right, that if any of the required controllers aren't enabled at the kernel level we fail with `NVALID_CGROUPS_GENERIC`. However, the `if` condition is within the cgroups v1 branch while it used to be outside a version specific branch. Also note that `/proc/cgroup` containing (last digit `0`, indicating the enabled flag): cpuset 3 1 0\n ... is semantically equivalent to it being missing entirely from `proc/cgroup`. But keeping `if (i != PIDS_IDX && i != CPUSET_IDX) {` above, would keep `all_required_controllers_enabled == true` which is not correct. Yes, we should keep/add a test like you suggest, but amend the patch to something like this: https://github.com/jerboaa/jdk/commit/26f765db9fef6f1d7be79452da701987274117c5 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1975171365 From lucy at openjdk.org Fri Feb 28 10:33:52 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 28 Feb 2025 10:33:52 GMT Subject: RFR: 8350716: [s390] intrinsify Thread.currentThread() In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 04:14:37 GMT, Amit Kumar wrote: > s390x port for [JDK-8278793](https://bugs.openjdk.org/browse/JDK-8278793) LGTM. One minor comment. src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp line 2010: > 2008: > 2009: address TemplateInterpreterGenerator::generate_currentThread() { > 2010: address entry_point = __ pc(); I would suggest to remember the entry offset, not the entry address, just like in the generator above. In general, it is possible that, while generating code, the allocated space gets exhausted. In that case, a new buffer is allocated and the code generated so far is copied to the new space. All remembered addresses become invalid then. Offsets are relative to the begin of the buffer and remain valid. Afaik, this concern is not relevant when generating the interpreter. Here, the allocated space must be large enough from the beginning. Doing it right anyway would be nice. Thanks. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23791#pullrequestreview-2650368145 PR Review Comment: https://git.openjdk.org/jdk/pull/23791#discussion_r1975184533 From tschatzl at openjdk.org Fri Feb 28 10:35:03 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 28 Feb 2025 10:35:03 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v2] In-Reply-To: <3zmj-DeeRyPMHc32YnvfqACN0xJxLQ6jZZ7sd-Baa3w=.672912f6-e4a3-4679-b8a3-b7f6ad51589d@github.com> References: <3zmj-DeeRyPMHc32YnvfqACN0xJxLQ6jZZ7sd-Baa3w=.672912f6-e4a3-4679-b8a3-b7f6ad51589d@github.com> Message-ID: On Thu, 27 Feb 2025 18:24:15 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * remove unnecessarily added logging > > src/hotspot/share/gc/g1/g1BarrierSet.hpp line 54: > >> 52: // them, keeping the write barrier simple. >> 53: // >> 54: // The refinement threads mark cards in the the current collection set specially on the > > "the the" typo. I fixed one more occurrence in files changed in this CR. There are like 10 more of these duplications in our code, I will fix separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1975186407 From amitkumar at openjdk.org Fri Feb 28 10:45:11 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Feb 2025 10:45:11 GMT Subject: RFR: 8350716: [s390] intrinsify Thread.currentThread() [v2] In-Reply-To: References: Message-ID: > s390x port for [JDK-8278793](https://bugs.openjdk.org/browse/JDK-8278793) Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: comment from Lutz ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23791/files - new: https://git.openjdk.org/jdk/pull/23791/files/60837a28..31c1768d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23791&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23791&range=00-01 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23791.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23791/head:pull/23791 PR: https://git.openjdk.org/jdk/pull/23791 From amitkumar at openjdk.org Fri Feb 28 10:45:11 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Feb 2025 10:45:11 GMT Subject: RFR: 8350716: [s390] intrinsify Thread.currentThread() In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 04:14:37 GMT, Amit Kumar wrote: > s390x port for [JDK-8278793](https://bugs.openjdk.org/browse/JDK-8278793) I think, I need approval one more time to make the bot happy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23791#issuecomment-2690317104 From amitkumar at openjdk.org Fri Feb 28 10:45:12 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Feb 2025 10:45:12 GMT Subject: RFR: 8350716: [s390] intrinsify Thread.currentThread() [v2] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 10:31:03 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> comment from Lutz > > src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp line 2010: > >> 2008: >> 2009: address TemplateInterpreterGenerator::generate_currentThread() { >> 2010: address entry_point = __ pc(); > > I would suggest to remember the entry offset, not the entry address, just like in the generator above. > In general, it is possible that, while generating code, the allocated space gets exhausted. In that case, a new buffer is allocated and the code generated so far is copied to the new space. All remembered addresses become invalid then. Offsets are relative to the begin of the buffer and remain valid. > > Afaik, this concern is not relevant when generating the interpreter. Here, the allocated space must be large enough from the beginning. Doing it right anyway would be nice. Thanks. I have updated the code. Thanks for the explanation as well :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23791#discussion_r1975199040 From mdoerr at openjdk.org Fri Feb 28 10:50:00 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Feb 2025 10:50:00 GMT Subject: RFR: 8343840: Rewrite the ObjectMonitor lists [v2] In-Reply-To: References: <_XnhdwtuB6AhiTL4TYmV4yqIy_WwQEeASn2b2zL9-V0=.05ec2994-8599-4f76-871d-a9e2bbe8afa2@github.com> Message-ID: On Fri, 28 Feb 2025 07:00:40 GMT, Amit Kumar wrote: > I've used QEMU to smoke test this PR on ppc64le, riscv64 and s390x, But it would be nice if @TheRealMDoerr, @RealFYang and @offamitkumar could check if it runs okay on real hardware as well. The PPC64 code looks correct and some quick tests have passed. I'll run larger test suites over the weekend. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23421#issuecomment-2690327204 From tschatzl at openjdk.org Fri Feb 28 11:25:53 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 28 Feb 2025 11:25:53 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v2] In-Reply-To: <3zmj-DeeRyPMHc32YnvfqACN0xJxLQ6jZZ7sd-Baa3w=.672912f6-e4a3-4679-b8a3-b7f6ad51589d@github.com> References: <3zmj-DeeRyPMHc32YnvfqACN0xJxLQ6jZZ7sd-Baa3w=.672912f6-e4a3-4679-b8a3-b7f6ad51589d@github.com> Message-ID: <9tS5E1tteGutSNX7rZh5WYLdZoF7Vgl_4_pjuAdT4WU=.c8c73c45-7abb-48a9-b623-769d3c1679ca@github.com> On Thu, 27 Feb 2025 12:07:29 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * remove unnecessarily added logging > > src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 349: > >> 347: >> 348: bool do_heap_region(G1HeapRegion* r) override { >> 349: if (!r->is_free()) { > > I am a bit lost on this closure; the intention seems to set unclaimed to all non-free regions, why can't this be done in one go, instead of first setting all regions to claimed (`reset_all_claims_to_claimed`), then set non-free ones unclaimed? `do_heap_region()` only visits committed regions in this case. I wanted to avoid the additional check in the iteration code. If you still think it is more clear to filter those out later, please tell me. I'll add a comment for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1975250646 From tschatzl at openjdk.org Fri Feb 28 12:14:01 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 28 Feb 2025 12:14:01 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v2] In-Reply-To: <3zmj-DeeRyPMHc32YnvfqACN0xJxLQ6jZZ7sd-Baa3w=.672912f6-e4a3-4679-b8a3-b7f6ad51589d@github.com> References: <3zmj-DeeRyPMHc32YnvfqACN0xJxLQ6jZZ7sd-Baa3w=.672912f6-e4a3-4679-b8a3-b7f6ad51589d@github.com> Message-ID: <87L5pcyGAgyDsXTwlSdAFLyIAOcUl1ZdYXK-nwzLrUQ=.c3db7522-b3e6-46e0-b268-e457c3d2bdc2@github.com> On Thu, 27 Feb 2025 18:31:16 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * remove unnecessarily added logging > > src/hotspot/share/gc/g1/g1RemSet.cpp line 1252: > >> 1250: G1ConcurrentRefineWorkState::snapshot_heap_into(&constructed); >> 1251: claim = &constructed; >> 1252: } > > It's not super obvious to me why the "has_sweep_claims" checking needs to be on this level. Can `G1ConcurrentRefineWorkState` return a valid `G1CardTableClaimTable*` directly? I agree. I remember having similar thoughts as well, but then did not do anything about this. Will fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1975311607 From rrich at openjdk.org Fri Feb 28 12:14:07 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 28 Feb 2025 12:14:07 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v3] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Wed, 19 Feb 2025 00:37:14 GMT, Dean Long wrote: >> When calling a MethodHandle linker, such as linkToStatic, we drop the last argument, which causes a mismatch between what the caller pushed and what the callee received. In deoptimization, we check for this in several places, but in one place we had outdated code. See the bug for the gory details. >> >> In this PR I add asserts and a test to reproduce the problem, plus the necessary fixes in deoptimizations. There are other inefficiencies in deoptimization that I didn't address, hoping to simplify the fix for backports. >> >> Some platforms align locals according to the caller during deoptimization, while some align locals according to the callee. The asserts I added compute locals both ways and check that they are still within the frame. I attempted this on all platforms, but am only able to test x64 and aarch64. I need help testing those asserts for arm32, ppc, riscv, and s390. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > Stricter assertion on ppc64 Marked as reviewed by rrich (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23557#pullrequestreview-2650575715 From rrich at openjdk.org Fri Feb 28 12:14:09 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 28 Feb 2025 12:14:09 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v3] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Thu, 27 Feb 2025 17:44:05 GMT, Patricio Chilano Mateo wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> Stricter assertion on ppc64 > > src/hotspot/share/runtime/deoptimization.cpp line 645: > >> 643: methodHandle method(current, deopt_sender.interpreter_frame_method()); >> 644: Bytecode_invoke cur(method, deopt_sender.interpreter_frame_bci()); >> 645: if (!cur.is_invokedynamic() && MethodHandles::has_member_arg(cur.klass(), cur.name())) { > > I was confused with this new condition but I see is the same we have in `vframeArray::unpack_to_stack()`. +1 I see there's also an assertion in `ConstantPool::klass_ref_index_at()`. It might be worth a little comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1975310438 From duke at openjdk.org Fri Feb 28 12:50:02 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Fri, 28 Feb 2025 12:50:02 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 09:37:29 GMT, Severin Gehwolf wrote: >> src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 339: >> >>> 337: cg_infos[MEMORY_IDX]._enabled = (enabled == 1); >>> 338: } else if (strcmp(name, "cpuset") == 0) { >>> 339: log_debug(os, container)("Detected optional cpuset controller entry in %s", controllers_file); >> >> In https://bugs.openjdk.org/browse/JDK-8347129 we decided to keep the `cpuset` optionality alone for cg v1. I'd prefer if we kept it that way as it's becoming increasingly difficult to find those systems (or change them). Thus, hard to know if this would break something. > > Update: Please remove the log line, since this is the cg v1 branch and there cpuset isn't optional. OK, will do. This represents a change to debug logging on `RHEL-8`, at least in my default test configuration. Currently it is, with and without my patch: $ jdk/bin/java -Xlog:os+container=trace -version [0.001s][trace][os,container] OSContainer::init: Initializing Container Support [0.001s][debug][os,container] Detected optional cpuset controller entry in /proc/cgroups [0.001s][debug][os,container] Detected optional pids controller entry in /proc/cgroups [0.001s][debug][os,container] Detected cgroups hybrid or legacy hierarchy, using cgroups v1 controllers [...] However, I agree that the change is a good one, since the debug message was inaccurate when the system was ultimately determined to be in `cgroups v1` mode. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1975354075 From duke at openjdk.org Fri Feb 28 12:41:57 2025 From: duke at openjdk.org (Thomas Fitzsimmons) Date: Fri, 28 Feb 2025 12:41:57 GMT Subject: RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 10:21:09 GMT, Severin Gehwolf wrote: >> Hmm, this logic was there before my changes; I am pretty sure I did keep `cpuset` required for `cgroups v1`. >> >> To attempt to confirm my patch had not changed the `cgroups v1` `cpuset` optionality logic, I wrote a test case for "cgroups v1, cpuset not present in /proc/cgroups". >> >> (To avoid assertion failures I created fake `/proc/self/cgroup` and `/proc/self/mountinfo` files that also do not have `cpuset` lines.) >> >> I adapted the test case to before my patch: >> >> https://github.com/fitzsim/jdk/commit/cab0000598b522d38865452575f95ed368b5935b >> >> and after my patch: >> >> https://github.com/fitzsim/jdk/commit/d8537baed5d56b5f36ce628b8c180ce18ce6f6e6 >> >> and it passes in both cases. i.e., for `cgroups v1`, `cpuset` is required, so `determine_type` produces `INVALID_CGROUPS_V1 (4)`, from this logic: >> >> >> if (!cg_infos[CPUSET_IDX]._data_complete) { >> log_debug(os, container)("Required cgroup v1 cpuset subsystem not found"); >> cleanup(cg_infos); >> *flags = INVALID_CGROUPS_V1; >> return false; >> } >> >> >> The test fails in both cases if I remove ` && i != CPUSET_IDX`, with `determine_type` returning `INVALID_CGROUPS_GENERIC (6)`: >> >> >> Required cpuset controller missing in /proc/cgroups. Invalid. expected: 4 but was: 6 >> >> >> due to this block (which my patch left alone): >> >> >> if (!all_required_controllers_enabled) { >> // one or more required controllers disabled, disable container support >> log_debug(os, container)("One or more required controllers disabled at kernel level."); >> cleanup(cg_infos); >> *flags = INVALID_CGROUPS_GENERIC; >> return false; >> } >> >> >> No existing test checks this logic as far as I can tell, so if you think the above makes sense, I can add this test to this pull request. > > You are right, that if any of the required controllers aren't enabled at the kernel level we fail with `NVALID_CGROUPS_GENERIC`. However, the `if` condition is within the cgroups v1 branch while it used to be outside a version specific branch. > > Also note that `/proc/cgroup` containing (last digit `0`, indicating the enabled flag): > > > cpuset 3 1 0\n > > > ... is semantically equivalent to it being missing entirely from `proc/cgroup`. But keeping `if (i != PIDS_IDX && i != CPUSET_IDX) {` above, would keep `all_required_controllers_enabled == true` which is not correct. Yes, we should keep/add a test like you suggest, but amend the patch to something like this: https://github.com/jerboaa/jdk/commit/26f765db9fef6f1d7be79452da701987274117c5 Makes sense, will do. I will also simplify the test case as you suggest. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1975345816 From azafari at openjdk.org Fri Feb 28 13:07:10 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 28 Feb 2025 13:07:10 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 17:54:25 GMT, Gerard Ziemski wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> removed UseFlagInPlace test. > > src/hotspot/share/nmt/memoryFileTracker.hpp line 32: > >> 30: #include "nmt/nmtNativeCallStackStorage.hpp" >> 31: #include "nmt/vmatree.hpp" >> 32: #include "nmt/virtualMemoryTracker.hpp" > > Are you 100% sure we need it here? `VirtualMemorySnapshot` is used here which is defined in `virtualMemoryTracker.hpp`. > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 59: > >> 57: if (_tracker == nullptr) return false; >> 58: new (_tracker) VirtualMemoryTracker(level == NMT_detail); >> 59: return _tracker->tree() != nullptr; > > We should check for `tree() != nullptr;` inside VirtualMemoryTracker constructor as assert? Then in Product build, the `tree == nullptr` ends up in a crash. > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 135: > >> 133: VirtualMemorySummary::record_uncommitted_memory(-commit_delta, tag); >> 134: else >> 135: print_err("uncommit"); > > Missing braces. Done. > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 213: > >> 211: log_info(nmt)("region in walker vmem, base: " INTPTR_FORMAT " size: %zu , %s, committed: %zu", >> 212: p2i(rgn.base()), rgn.size(), rgn.tag_name(), rgn.committed_size()); >> 213: if (!walker->do_allocation_site(&rgn)) > > Missing braces. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975376033 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975377752 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975380490 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975381241 From azafari at openjdk.org Fri Feb 28 13:12:17 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 28 Feb 2025 13:12:17 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 18:11:26 GMT, Gerard Ziemski wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> removed UseFlagInPlace test. > > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 225: > >> 223: >> 224: int compare_reserved_region_base(const ReservedMemoryRegion& r1, const ReservedMemoryRegion& r2) { >> 225: return r1.compare(r2); > > Why did we name it `compare_reserved_region_base`, not simply `compare_reserved_region` This function and also `compare_committed_region` are used as comparators for `SortedLinkedList` and have no use anymore here. They are removed. Thanks for catching this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975387868 From azafari at openjdk.org Fri Feb 28 13:18:08 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 28 Feb 2025 13:18:08 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: <_NgwaL7X0Wail8MgHyql0JSLLkPBbHgrnCuuhdDEpzo=.8a272cee-1077-4c68-a018-5df5c867cc68@github.com> On Fri, 28 Feb 2025 09:44:09 GMT, Johan Sj?len wrote: >> src/hotspot/share/nmt/memTracker.hpp line 142: >> >>> 140: if (addr != nullptr) { >>> 141: NmtVirtualMemoryLocker nvml; >>> 142: VirtualMemoryTracker::Instance::add_reserved_region((address)addr, size, stack, mem_tag); >> >> I do not like: >> >> `VirtualMemoryTracker::Instance::add_reserved_region` >> >> with the `Instance` being repeated over and over. >> >> I'd prefer `VirtualMemoryTracker::add_reserved_region` and have `Instance` be impl detail inside. > > The wordiness is a bit annoying. The reason that we do this is to separate the global static instance from the implementation, so that we can have many `VMT`s when testing (this is very useful). Do you have a concrete way we can refactor this such that we retain the possibility of having many VMTs and one static instance, whilst reducing the wordiness when using the code? Can this refactoring wait until after integration, as we have more classes following the same pattern that'd need to be refactored? The `HeapReserver` and `MemoryFileTracker` classes (in different parts of the code and different PRs) also use the same syntax for it. Here the same style is used to keep similarity in Hotspot code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975395435 From azafari at openjdk.org Fri Feb 28 13:26:12 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 28 Feb 2025 13:26:12 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 18:06:50 GMT, Gerard Ziemski wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> removed UseFlagInPlace test. > > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 114: > >> 112: committed = VirtualMemorySummary::as_snapshot()->by_type(tag)->committed(); >> 113: if (reserve_delta != 0) { >> 114: if (reserve_delta > 0) > > Missing braces. Done. > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 118: > >> 116: else { >> 117: if ((size_t)-reserve_delta <= reserved) >> 118: VirtualMemorySummary::record_released_memory(-reserve_delta, tag); > > Missing braces. Done. > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 129: > >> 127: } >> 128: else >> 129: print_err("commit"); > > Missing braces. Done. > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 133: > >> 131: else { >> 132: if ((size_t)-commit_delta <= committed) >> 133: VirtualMemorySummary::record_uncommitted_memory(-commit_delta, tag); > > Missing braces. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975405233 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975400984 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975405522 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975405745 From azafari at openjdk.org Fri Feb 28 13:32:09 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 28 Feb 2025 13:32:09 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 17:44:25 GMT, Gerard Ziemski wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> removed UseFlagInPlace test. > > src/hotspot/share/nmt/memReporter.cpp line 440: > >> 438: VirtualMemoryTracker::Instance::tree()->visit_committed_regions(*reserved_rgn, >> 439: [&](CommittedMemoryRegion& committed_rgn) { >> 440: if (committed_rgn.size() == reserved_rgn->size() && committed_rgn.call_stack()->equals(*stack)) { > > If we are calling here > > `equals()` > > anyhow, why not have CommittedMemoryRegion:equals() that checks both the size and the stack? This way we can simply have: > > `if (committed_rgn.equals(reserved_rgn)) > ` Done. But it is used only here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975413206 From azafari at openjdk.org Fri Feb 28 13:41:13 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 28 Feb 2025 13:41:13 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: <6Lyj3WWHS6YYyIk9gkcQ5SS5C0Pon4dGGSYjOZ7fAiM=.ddd71400-d85c-49d2-b76b-e399c6027e5d@github.com> On Thu, 27 Feb 2025 13:29:35 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> removed UseFlagInPlace test. > > src/hotspot/share/nmt/nmtTreap.hpp line 388: > >> 386: head = to_visit.pop(); >> 387: if (!f(head)) >> 388: return; > > Style: Always use braces in `if` statements. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975424820 From azafari at openjdk.org Fri Feb 28 13:41:15 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 28 Feb 2025 13:41:15 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Fri, 28 Feb 2025 10:14:18 GMT, Johan Sj?len wrote: >> src/hotspot/share/nmt/vmatree.hpp line 73: >> >>> 71: assert(type < StateType::COUNT, "must be"); >>> 72: return statetype_strings[static_cast(type)]; >>> 73: } >> >> I don't like that we are hardcoding the size of this array and have COUNT be StateType. Can we do something like this?: >> >> >> enum class StateType : uint8_t { Reserved = 1, Committed = 3, Released = 0 }; >> >> private: >> static constexpr const char* const statetype_strings[] = {"released", "reserved", "only-committed", "committed"}; >> static constexpr int STATETYPE_COUNT = static_cast(sizeof(statetype_strings)/sizeof(char*)); >> >> public: >> NONCOPYABLE(VMATree); >> >> static const char* statetype_to_string(StateType type) { >> assert(static_cast(type) < STATETYPE_COUNT, "must be"); >> return statetype_strings[static_cast(type)]; >> } > > This is a fairly standard pattern in Hotspot, so personally I'm fine with keeping it as-is. > > Here's an incomplete list of pre-existing usages: > > ```c++ > // UL tag list > enum type { > __NO_TAG, > #define LOG_TAG(name) _##name, > LOG_TAG_LIST > #undef LOG_TAG > Count > }; > > > // enum for figuring positions and size of Symbol::_vm_symbols[] > enum class vmSymbolID : int { > // [FIRST_SID ... LAST_SID] is the iteration range for the *valid* symbols. > // NO_SID is used to indicate an invalid symbol. Some implementation code > // *may* read _vm_symbols[NO_SID], so it must be a valid array index. > NO_SID = 0, // exclusive lower limit > > #define VM_SYMBOL_ENUM(name, string) VM_SYMBOL_ENUM_NAME_(name), > VM_SYMBOLS_DO(VM_SYMBOL_ENUM, VM_ALIAS_IGNORE) > #undef VM_SYMBOL_ENUM > > SID_LIMIT, // exclusive upper limit > > #define VM_ALIAS_ENUM(name, def) VM_SYMBOL_ENUM_NAME_(name) = VM_SYMBOL_ENUM_NAME_(def), > VM_SYMBOLS_DO(VM_SYMBOL_IGNORE, VM_ALIAS_ENUM) > #undef VM_ALIAS_ENUM > > FIRST_SID = NO_SID + 1, // inclusive lower limit > LAST_SID = SID_LIMIT - 1, // inclusive upper limit > }; > > enum class InjectedFieldID : int { > ALL_INJECTED_FIELDS(DECLARE_INJECTED_FIELD_ENUM) > MAX_enum > }; > > enum class vmClassID : int { > #define DECLARE_VM_CLASS(name, symbol) _VM_CLASS_ENUM(name), _VM_CLASS_ENUM(symbol) = _VM_CLASS_ENUM(name), > VM_CLASSES_DO(DECLARE_VM_CLASS) > #undef DECLARE_VM_CLASS > > LIMIT, // exclusive upper limit > FIRST = 0, // inclusive upper limit > LAST = LIMIT - 1 // inclusive upper limit > }; > > enum class CodeBlobType { > MethodNonProfiled = 0, // Execution level 1 and 4 (non-profiled) nmethods (including native nmethods) > MethodProfiled = 1, // Execution level 2 and 3 (profiled) nmethods > NonNMethod = 2, // Non-nmethods like Buffers, Adapters and Runtime Stubs > All = 3, // All types (No code cache segmentation) > NumTypes = 4 // Number of CodeBlobTypes > }; There is coding style that the last enum member counts the number of them, like what we have `mt_number_of_tags` in `MemTag` enum. So, I renamed the last member to `st_number_of_states`. Would it be OK? Or preferred to use constant separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975423884 From tschatzl at openjdk.org Fri Feb 28 13:43:24 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 28 Feb 2025 13:43:24 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v3] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: - * ayang review 1 (ctd) * split up sweep-rt state into "start" (to be called once) and "step" (to be called repeatedly) phases * move building the snapshot our of g1remset - * ayang review 1 * use uint for number of reserved regions consistently * rename *sweep_state to *sweep_table * improved comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/9ef9c5f4..7d361fc1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=01-02 Stats: 108 lines in 8 files changed: 40 ins; 24 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From azafari at openjdk.org Fri Feb 28 13:46:08 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 28 Feb 2025 13:46:08 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Thu, 27 Feb 2025 13:44:48 GMT, Johan Sj?len wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> removed UseFlagInPlace test. > > src/hotspot/share/nmt/regionsTree.hpp line 30: > >> 28: #include "nmt/nmtCommon.hpp" >> 29: #include "nmt/vmatree.hpp" >> 30: #include "nmt/virtualMemoryTracker.hpp" > > This doesn't seem used. However, you do not include the `nmt/nmtNativeCallStackStorage.hpp` header. Done. > src/hotspot/share/nmt/regionsTree.hpp line 90: > >> 88: return true; >> 89: }); >> 90: } > > Move to `cpp` file, wrap in `#ifdef ASSERT`. Done. > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 105: > >> 103: // " vms-committed: %zu", >> 104: // str, NMTUtil::tag_to_name(tag), (long)reserve_delta, (long)commit_delta, reserved, committed); >> 105: }; > > Any plan regarding this? Will be commented out after https://github.com/openjdk/jdk/pull/23771. Until then, the error messages will pollute the output and corrupt the jdk-image. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975432017 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975429334 PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975431804 From azafari at openjdk.org Fri Feb 28 13:55:30 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 28 Feb 2025 13:55:30 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v33] In-Reply-To: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: > - `VMATree` is used instead of `SortedLinkList` in new class `VirtualMemoryTracker`. > - A wrapper/helper `RegionTree` is made around VMATree to make some calls easier. > - `find_reserved_region()` is used in 4 cases, it will be removed in further PRs. > - All tier1 tests pass except this https://bugs.openjdk.org/browse/JDK-8335167. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: style, some cleanup, VMT and regionsTree circular dep resolved ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20425/files - new: https://git.openjdk.org/jdk/pull/20425/files/74e4872d..70209581 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20425&range=31-32 Stats: 329 lines in 16 files changed: 146 ins; 123 del; 60 mod Patch: https://git.openjdk.org/jdk/pull/20425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20425/head:pull/20425 PR: https://git.openjdk.org/jdk/pull/20425 From cnorrbin at openjdk.org Fri Feb 28 14:13:28 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Fri, 28 Feb 2025 14:13:28 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v3] In-Reply-To: References: Message-ID: > Hi everyone, > > The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. > > The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. > > Additionally, I've created `align_up_or_min`, which behaves similarly to the original align_up but handles overflows predictably across all integer types. This new function is used in the locations where overflow checks already exist, providing a safer alternative. Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: changed max size of MinHeapDeltaBytes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23711/files - new: https://git.openjdk.org/jdk/pull/23711/files/aa8a8054..86d91252 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23711&range=01-02 Stats: 2 lines in 2 files changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23711.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23711/head:pull/23711 PR: https://git.openjdk.org/jdk/pull/23711 From cnorrbin at openjdk.org Fri Feb 28 14:26:13 2025 From: cnorrbin at openjdk.org (Casper Norrbin) Date: Fri, 28 Feb 2025 14:26:13 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v3] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 14:13:28 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. >> >> Additionally, I've created `align_up_or_min`, which behaves similarly to the original align_up but handles overflows predictably across all integer types. This new function is used in the locations where overflow checks already exist, providing a safer alternative. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > changed max size of MinHeapDeltaBytes Of the (previously modified) heap flags, `MinHeapDeltaBytes` is the only problem. The other flags have checks before the `align_up` which crash the vm before reaching that point. The previous max of `MinHeapDeltaBytes` was `max_uintx`, I lowered it to `max_uintx / 2` (`+1` to have it aligned). Now, if trying to set it to extreme values, we get a more informative error showing the maximum value instad of overflowing. This also means that the test works as expected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23711#issuecomment-2690772099 From mli at openjdk.org Fri Feb 28 14:42:23 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 28 Feb 2025 14:42:23 GMT Subject: RFR: 8345298: RISC-V: Add riscv backend for Float16 operations - scalar Message-ID: <5VD4_Y79DUxFsdDiYo7ze2TJ_8GGYtz5sySmSlj5zLc=.1378a06b-53ab-4655-aede-cb4dc5a59dec@github.com> Hi, Can you help to review this patch? It's an implementation of https://github.com/openjdk/jdk/pull/22754 on riscv. ## Performance still in progress ... ------------- Commit messages: - merge master - clean 2 - clean - initial commit Changes: https://git.openjdk.org/jdk/pull/23844/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23844&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345298 Stats: 447 lines in 11 files changed: 393 ins; 0 del; 54 mod Patch: https://git.openjdk.org/jdk/pull/23844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23844/head:pull/23844 PR: https://git.openjdk.org/jdk/pull/23844 From rrich at openjdk.org Fri Feb 28 15:27:05 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 28 Feb 2025 15:27:05 GMT Subject: RFR: 8336042: Caller/callee param size mismatch in deoptimization causes crash [v3] In-Reply-To: References: <4MjR9hdInhuJduDqpTqpGiyo_M_JQ6pM2g5_TgzcSTg=.16037e60-de66-4d0b-861b-19be80ff2751@github.com> Message-ID: On Fri, 28 Feb 2025 12:11:05 GMT, Richard Reingruber wrote: >> src/hotspot/share/runtime/deoptimization.cpp line 645: >> >>> 643: methodHandle method(current, deopt_sender.interpreter_frame_method()); >>> 644: Bytecode_invoke cur(method, deopt_sender.interpreter_frame_bci()); >>> 645: if (!cur.is_invokedynamic() && MethodHandles::has_member_arg(cur.klass(), cur.name())) { >> >> I was confused with this new condition but I see is the same we have in `vframeArray::unpack_to_stack()`. > > +1 > I see there's also an assertion in `ConstantPool::klass_ref_index_at()`. It might be worth a little comment. Actually I think that there should be an abstraction that hides that detail. Probably `has_member_arg` should be a method of `Bytecode_invoke`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23557#discussion_r1975594243 From lucy at openjdk.org Fri Feb 28 16:34:03 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 28 Feb 2025 16:34:03 GMT Subject: RFR: 8350716: [s390] intrinsify Thread.currentThread() [v2] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 10:45:11 GMT, Amit Kumar wrote: >> s390x port for [JDK-8278793](https://bugs.openjdk.org/browse/JDK-8278793) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > comment from Lutz Looks even better now... ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23791#pullrequestreview-2651242127 From mdoerr at openjdk.org Fri Feb 28 16:36:04 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Feb 2025 16:36:04 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Thu, 27 Feb 2025 13:40:51 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > use vsplitsb src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 561: > 559: VectorRegister vLowProduct, VectorRegister vMidProduct, VectorRegister vHighProduct, > 560: VectorRegister vReducedLow, VectorRegister vTmp8, VectorRegister vTmp9, > 561: VectorRegister vCombinedResult, VectorRegister vSwappedH) { I'd adjust the indentation. src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 574: > 572: masm->vsldoi(vLowProduct, vLowProduct, vLowProduct, 8); // Swap > 573: masm->vxor(vLowProduct, vLowProduct, vReducedLow); // Reduction using constant > 574: masm->vsldoi(vCombinedResult, vLowProduct, vLowProduct, 8); // Swap The part between the vpsumd instructions looks too complicated. Isn't it equivalent to the following? masm->vsldoi(vTmp8, vLowProduct, vHighProduct, 8); masm->vsldoi(vTmp9, vReducedLow, vReducedLow, 8); masm->vxor(vTmp8, vTmp8, vMidProduct); masm->vxor(vCombinedResult, vTmp8, vTmp9); src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 699: > 697: > 698: __ bind(L_initialize_unaligned_loop); > 699: __ li(temp1,0); Missing whitespace. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1975696315 PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1975694548 PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1975696864 From jwaters at openjdk.org Fri Feb 28 16:37:01 2025 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 28 Feb 2025 16:37:01 GMT Subject: RFR: 8345265: Minor improvements for LTO across all compilers [v2] In-Reply-To: References: <2y8p-J2SCTANChv8WvrXmYI1UjVxbC7n8tUJzBOMzEE=.7c2b48a5-423e-4138-8671-3037e8963730@github.com> <3peOk4hOWRVX3sn5BHQbRh5ymyP8Sr146H66jDWkePA=.ef3d0788-2bfa-421b-ad92-a1e46fd0feb5@github.com> Message-ID: On Tue, 18 Feb 2025 13:44:56 GMT, Matthias Baesken wrote: > > @MBaesken Currently with LTO active on gcc 14 commit [e648a90](https://github.com/openjdk/jdk/commit/e648a907b31fd0d6b746d149fda2a8d5fbe26dc0) is causing serious trouble on my end by mass inlining everything, bloating the JVM to nearly 60MB in size, does HotSpot have the same size issues on your end with LTO? (--enable-jvm-feature-opt-size is off the table because the JVM should ideally be an acceptable size even without that flag, and -Os and LTO doesn't work with gcc anyway) > > On my end we used gcc11 in the past and now test gcc13. Both work nicely, no libjvm.so bloat has been observed with lto. Maybe there is some issue/difference with gcc14 but so far we did not test with this version. Leaving Kim's comment about flattening in here, as I believe something has changed with the flatten attribute in gcc 14 that made it far more aggressive across compilation units, so this is probably relevant. A simple test of symbol size with nm strongly supports this theory > G1ParScanThreadState uses ATTRIBUTE_FLATTEN to tune the inlining of code in that class, in an attempt to ensure the desired fast paths are inlined, despite the size and other attributes of some of these functions that might (and empirically did) inhibit inlining in some critical places with at least some compilers. It also uses NOINLINE and assumed implicit non-inlining of definitions in other translation units to keep slower paths out of line, to limit code size. It looks like there are a couple of things going on here. > > One is that slow paths internal to the Stack implementation (and perhaps other places) are being flattened because there isn't any NOINLINE anywhere to prevent it and it's all template code, so source is not in another translation unit. So we're probably generating more inline code that we really want. That seems hard to avoid though. > > The other is that LTO seems to be applying flattening even across translation units. (That's not completely surprising.) So the assumption that the flattening won't apply to code in other TU's is invalidated by LTO. That's going to make flatting a lot harder to use. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22464#issuecomment-2691081602 From tschatzl at openjdk.org Fri Feb 28 17:52:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 28 Feb 2025 17:52:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v4] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * fix assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/7d361fc1..d87935a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From ayang at openjdk.org Fri Feb 28 19:31:54 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 28 Feb 2025 19:31:54 GMT Subject: RFR: 8346916: [REDO] align_up has potential overflow [v3] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 14:13:28 GMT, Casper Norrbin wrote: >> Hi everyone, >> >> The `align_up` function can potentially overflow, resulting in undefined behavior. Most use cases rely on the assumption that aligned_result >= original. To address this, I've added an assertion to verify this condition. >> >> The original PR (#20808) missed cases where overflow checks already existed, so I've now went through usages of `align_up` and found the places with explicit checks. Most notably, #23168 added `align_up_or_null` to metaspace, but this function is also useful elsewhere. Given this, I relocated it to `align.hpp`, alongside the rest of the alignment functions. >> >> Additionally, I've created `align_up_or_min`, which behaves similarly to the original align_up but handles overflows predictably across all integer types. This new function is used in the locations where overflow checks already exist, providing a safer alternative. > > Casper Norrbin has updated the pull request incrementally with one additional commit since the last revision: > > changed max size of MinHeapDeltaBytes src/hotspot/share/gc/parallel/psOldGen.cpp line 193: > 191: #endif > 192: const size_t alignment = virtual_space()->alignment(); > 193: size_t aligned_bytes = align_up_or_min(bytes, alignment); How about using `bytes = MIN2(bytes, virtual_space()->uncommitted_size())` to dodge the potential overflow? I find it more intuitive to provide a proper arg to `align_up` and expect the result to be >= arg. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23711#discussion_r1975917340 From gziemski at openjdk.org Fri Feb 28 19:54:08 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 28 Feb 2025 19:54:08 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: <_NgwaL7X0Wail8MgHyql0JSLLkPBbHgrnCuuhdDEpzo=.8a272cee-1077-4c68-a018-5df5c867cc68@github.com> References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> <_NgwaL7X0Wail8MgHyql0JSLLkPBbHgrnCuuhdDEpzo=.8a272cee-1077-4c68-a018-5df5c867cc68@github.com> Message-ID: On Fri, 28 Feb 2025 13:15:29 GMT, Afshin Zafari wrote: >> The wordiness is a bit annoying. The reason that we do this is to separate the global static instance from the implementation, so that we can have many `VMT`s when testing (this is very useful). Do you have a concrete way we can refactor this such that we retain the possibility of having many VMTs and one static instance, whilst reducing the wordiness when using the code? Can this refactoring wait until after integration, as we have more classes following the same pattern that'd need to be refactored? > > The `HeapReserver` and `MemoryFileTracker` classes (in different parts of the code and different PRs) also use the same syntax for it. Here the same style is used to keep similarity in Hotspot code. Right, I didn't like it before, and spoke out against it, and now it is spreading :-) Why do we want to have more than one VMT? If we truly do, then I'm not sure there is anything that could be done here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975940036 From gziemski at openjdk.org Fri Feb 28 19:58:10 2025 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 28 Feb 2025 19:58:10 GMT Subject: RFR: 8337217: Port VirtualMemoryTracker to use VMATree [v32] In-Reply-To: References: <_QgAec-LQq4pdC6sP3UAZLHRT30q1mxXohvGDag1a6U=.214e9d81-c627-4f34-af8f-cb71506eeda2@github.com> Message-ID: On Fri, 28 Feb 2025 13:37:32 GMT, Afshin Zafari wrote: >> This is a fairly standard pattern in Hotspot, so personally I'm fine with keeping it as-is. >> >> Here's an incomplete list of pre-existing usages: >> >> ```c++ >> // UL tag list >> enum type { >> __NO_TAG, >> #define LOG_TAG(name) _##name, >> LOG_TAG_LIST >> #undef LOG_TAG >> Count >> }; >> >> >> // enum for figuring positions and size of Symbol::_vm_symbols[] >> enum class vmSymbolID : int { >> // [FIRST_SID ... LAST_SID] is the iteration range for the *valid* symbols. >> // NO_SID is used to indicate an invalid symbol. Some implementation code >> // *may* read _vm_symbols[NO_SID], so it must be a valid array index. >> NO_SID = 0, // exclusive lower limit >> >> #define VM_SYMBOL_ENUM(name, string) VM_SYMBOL_ENUM_NAME_(name), >> VM_SYMBOLS_DO(VM_SYMBOL_ENUM, VM_ALIAS_IGNORE) >> #undef VM_SYMBOL_ENUM >> >> SID_LIMIT, // exclusive upper limit >> >> #define VM_ALIAS_ENUM(name, def) VM_SYMBOL_ENUM_NAME_(name) = VM_SYMBOL_ENUM_NAME_(def), >> VM_SYMBOLS_DO(VM_SYMBOL_IGNORE, VM_ALIAS_ENUM) >> #undef VM_ALIAS_ENUM >> >> FIRST_SID = NO_SID + 1, // inclusive lower limit >> LAST_SID = SID_LIMIT - 1, // inclusive upper limit >> }; >> >> enum class InjectedFieldID : int { >> ALL_INJECTED_FIELDS(DECLARE_INJECTED_FIELD_ENUM) >> MAX_enum >> }; >> >> enum class vmClassID : int { >> #define DECLARE_VM_CLASS(name, symbol) _VM_CLASS_ENUM(name), _VM_CLASS_ENUM(symbol) = _VM_CLASS_ENUM(name), >> VM_CLASSES_DO(DECLARE_VM_CLASS) >> #undef DECLARE_VM_CLASS >> >> LIMIT, // exclusive upper limit >> FIRST = 0, // inclusive upper limit >> LAST = LIMIT - 1 // inclusive upper limit >> }; >> >> enum class CodeBlobType { >> MethodNonProfiled = 0, // Execution level 1 and 4 (non-profiled) nmethods (including native nmethods) >> MethodProfiled = 1, // Execution level 2 and 3 (profiled) nmethods >> NonNMethod = 2, // Non-nmethods like Buffers, Adapters and Runtime Stubs >> All = 3, // All types (No code cache segmentation) >> NumTypes = 4 // Number of CodeBlobTypes >> }; > > There is coding style that the last enum member counts the number of them, like what we have `mt_number_of_tags` in `MemTag` enum. > So, I renamed the last member to `st_number_of_states`. Would it be OK? Or preferred to use constant separately. Just because we do something already in Hotspot, doesn't necessarily mean that we should repeat the pattern going forward. I flagged it and I really don't like it, if you guys are OK with it, I will let it be. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20425#discussion_r1975943416 From dlong at openjdk.org Fri Feb 28 20:38:53 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 28 Feb 2025 20:38:53 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v4] In-Reply-To: References: Message-ID: <2jI87up85vKeQq7xy6WoI987MOuqTqA6I8G75VvC74g=.e8ef9f9c-b8b3-496d-9b48-28c83dc1fb64@github.com> On Thu, 27 Feb 2025 12:02:44 GMT, Damon Fenacci wrote: >> # Issue >> The test `src/hotspot/share/opto/c2compiler.cpp` fails intermittently due to a crash that happens when trying to allocate code cache space for C1 and C2 in `RuntimeStub::new_runtime_stub` and `SingletonBlob::operator new`. >> >> # Causes >> There are a few call paths during the initialization of C1 and C2 that can lead to the code cache allocations in `RuntimeStub::new_runtime_stub` (through `RuntimeStub::operator new`) and `SingletonBlob::operator new` triggering a fatal error if there is no more space. The paths in question are: >> 1. `Compiler::init_c1_runtime` -> `Runtime1::initialize` -> `Runtime1::generate_blob_for` -> `Runtime1::generate_blob` -> `RuntimeStub::new_runtime_stub` >> 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_stub` -> `Compile::Compile` -> `Compile::Code_Gen` -> `PhaseOutput::install` -> `PhaseOutput::install_stub` -> `RuntimeStub::new_runtime_stub` >> 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_uncommon_trap_blob` -> `UncommonTrapBlob::create` -> `new UncommonTrapBlob` >> 1. `C2Compiler::initialize` -> `C2Compiler::init_c2_runtime` -> `OptoRuntime::generate` -> `OptoRuntime::generate_exception_blob` -> `ExceptionBlob::create` -> `new ExceptionBlob` >> >> # Solution >> Instead of fatally crashing the we can use the `alloc_fail_is_fatal` flag of `RuntimeStub::new_runtime_stub` to avoid crashing in cases 1 and 2 and add a similar flag to `SingletonBlob::operator new` for cases 3 and 4. In the latter case we need to adjust all calls accordingly. >> >> Note: In [JDK-8326615](https://bugs.openjdk.org/browse/JDK-8326615) it was argued that increasing the minimum code cache size would solve the issue but that wasn't entirely accurate: doing so possibly decreases the chances of a failed allocation in these 4 places but doesn't totally avoid it. >> >> # Testing >> The original failing regression test in `test/hotspot/jtreg/compiler/startup/StartupOutput.java` has been modified to run multiple times with randomized values (within the original failing range) to increase the chances of hitting the fatal assertion. >> >> Tests: Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8347406: re-add modified assert Refreshing my memory, isn't the real problem with trying to fix this with a minimum codecache size is that some of these stubs are not allocated during initial single-threaded JVM startup, but later when the first compiler threads start, and that allows other code blobs to fill up the codecache? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23630#issuecomment-2691503254 From dlong at openjdk.org Fri Feb 28 20:38:54 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 28 Feb 2025 20:38:54 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v4] In-Reply-To: <2jI87up85vKeQq7xy6WoI987MOuqTqA6I8G75VvC74g=.e8ef9f9c-b8b3-496d-9b48-28c83dc1fb64@github.com> References: <2jI87up85vKeQq7xy6WoI987MOuqTqA6I8G75VvC74g=.e8ef9f9c-b8b3-496d-9b48-28c83dc1fb64@github.com> Message-ID: On Fri, 28 Feb 2025 20:35:58 GMT, Dean Long wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8347406: re-add modified assert > > Refreshing my memory, isn't the real problem with trying to fix this with a minimum codecache size is that some of these stubs are not allocated during initial single-threaded JVM startup, but later when the first compiler threads start, and that allows other code blobs to fill up the codecache? > Even so, it might be a good idea to additionally increase the minimum code cache anyway. @dean-long do you think it would make sense to file an RFE for that? Sure, if it's still an issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23630#issuecomment-2691503902 From dlong at openjdk.org Fri Feb 28 20:45:59 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 28 Feb 2025 20:45:59 GMT Subject: RFR: 8347406: [REDO] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v4] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 12:03:16 GMT, Damon Fenacci wrote: >> src/hotspot/share/opto/output.cpp line 3487: >> >>> 3485: C->record_failure("CodeCache is full"); >>> 3486: } else { >>> 3487: C->set_stub_entry_point(rs->entry_point()); >> >> Is the deleted rs->is_runtime_stub() assert still useful here? > > A slightly modified one surely is. Inserted it again. I was thinking it could be moved into the `else` clause and simplified further. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23630#discussion_r1975989747 From mdoerr at openjdk.org Fri Feb 28 21:46:57 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Feb 2025 21:46:57 GMT Subject: RFR: JDK-8216437 : PPC64: Add intrinsic for GHASH algorithm [v28] In-Reply-To: References: <2cIptfLHrdxSy0t7RdsRlde94arK3gmqge9AiXmOZeo=.069a496c-e9dd-40cd-a144-306a65df0e1a@github.com> Message-ID: On Thu, 27 Feb 2025 13:40:51 GMT, Suchismith Roy wrote: >> JBS Issue : [JDK-8216437](https://bugs.openjdk.org/browse/JDK-8216437) >> >> Currently acceleration code for GHASH is missing for PPC64. >> >> The current implementation utlilises SIMD instructions on Power and uses Karatsuba multiplication for obtaining the final result. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > use vsplitsb src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 562: > 560: VectorRegister vReducedLow, VectorRegister vTmp8, VectorRegister vTmp9, > 561: VectorRegister vCombinedResult, VectorRegister vSwappedH) { > 562: assert(masm != nullptr, "MacroAssembler pointer is null"); I think this assertion is not really beneficial. Who would pass nullptr? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20235#discussion_r1976042069